Platform Engineering Overview, News & Trends | The New Stack https://thenewstack.io/platform-engineering/ Tue, 13 Jun 2023 16:31:53 +0000 en-US hourly 1 https://wordpress.org/?v=6.1.1 At PlatformCon: For Realtor.com, Success Is Driven by Stories https://thenewstack.io/at-platformcon-for-realtor-com-success-is-driven-by-stories/ Tue, 13 Jun 2023 16:31:53 +0000 https://thenewstack.io/?p=22710695

You’re only as good as the stories you tell. Storytelling, after all, is a tenet of humanity, and the best

The post At PlatformCon: For Realtor.com, Success Is Driven by Stories appeared first on The New Stack.

]]>

You’re only as good as the stories you tell. Storytelling, after all, is a tenet of humanity, and the best way to pass information, at least when it’s anchored in context. It’s also a pillar of successful sales. No matter what you’re selling or who you’re selling it to.

For platform engineering, your eager or not-so-eager audience is made up of your colleagues, the internal developers as well as other company-wide stakeholders and influencers. You have to understand the context and needs of your different target personas, and how they could respond to the changes you’re making. Much of intentional developer experience and platform adoption hinges on your ability to convey what works and what hasn’t, often socratically repeating back to be sure you comprehend your stakeholders’ stakes — and making sure they feel heard.

For Realtor.com, a platform engineering mindset is anchored in the power of success stories. Suzy Julius, SVP of product and engineering, joined the virtual PlatformCon stage to share how the top U.S. real estate site, with 86 million visits per month, went from a culture where you couldn’t say platform to a culture that embraces it.

The First Step Is always Recognition

Realtor.com is a company that’s over the last couple of years scaled mainly via acquisition, which often results in either spaghetti architecture or a complete lack of visibility into other business units. It pretty much always signals an increase in complexity.

“Our tech stack became extremely complex, slowing down our ability to build features in a fast and reliable way,” Julius said. “The existing tech stack made it difficult to ensure a quality product or ensure reliable feature releases.”

Facing its divergent and often duplicated tech ecosystem, in 2020, the company embarked on a transformation, with the aim to “simplify to scale” in order to accelerate innovation.

A platform emerged as the solution.

When Julius joined the company at the start of 2021, her team recognized the common barriers to entry to platform adoption, mainly, “knowing that there was a reluctance to building a platform, with fear that one would slow down the engineering team by creating more complexity.” Not an uncommon hurdle for platform engineers to face at all.

So the platform team kicked off this journey gaining feedback from a diverse background of stakeholders, not just from engineering, but from business and security, and offered a compelling success story, she explained. Now, 150 people are considered part of the platform organization — a mix of product leaders and engineers, who she said are all “focused on developer experience, data, content and personalization.”

Next, It’s Time to Adopt a Product Mindset

Come 2022 and the platform team was embracing a platform mindset, concentrating on developer enablement and providing a service to their colleagues. Specifically, Julius outlined the aims as:

  • To provide service to others to help everyone go faster and more reliably.
  • To understand as a platform team the vision and principles, and then to get corporate buy-in.
  • To be able to show short-term and long-term wins.
  • To measure, iterate and evangelize the vision to be a platform empowering all products and unlocking business opportunities.

These goals, she said, mostly focused on developer experience, but they also created a data platform for a “clear line of sight to understand business metrics or give analytics the ability to create a canonical source of truth dataset for our consumer and customers.”

The tech stack that drove this sociotechnical change included:

  • For developer experience — CircleCI, Apollo supergraph, GraphQL, Amazon EKS. ArgoCD, Tyk API gateway, Vault developer portal
  • For data, content and personalization — Fivetran automated data movement platform, Snowflake for data integration, Apache Kafka, DBT for data warehousing, Apache Airflow, NodeJS, Amazon SageMaker for machine learning, Optimizely, Metaflow data science framework, ElasticSearch

All the platform tech, people and processes are aligned around the vision to become the preferred platform on which their internal customers choose to build. That is grounded, Julius explained, in connecting wins with features that drive business metrics, namely, revenue and/or user engagement.

She highlighted sociotechnical lessons they learned over the past year:

  • A platform mindset is not just a technical but a cultural shift.
  • Adoption hinges on training, documentation and awareness.
  • You need a tighter feedback loop to establish stakeholder sentiment.
  • Be aware not to over-index on microservices. For example, they had rate-limiting in different locations, which Julius said made it hard to build client features.
  • Align around a few programming languages, as too many make it much harder to build cross-company platform features like logging and monitoring.
  • And, in a time of tighter budgets, make sure you commit to continuously invest in your platform service, no matter what.

Keep up the Momentum

Now, this year at Realtor.com is all about embracing the Platform as a Product mindset and building a differentiated, self-service product suite. Treating your platform as a product is about treating your developers like your customers, always focusing on improving developer experience or DevEx. For Realtor.com, this includes continuous feedback and stakeholder scorecards.

This year is about “understanding that we need to continue to solve problems and to make it easy and intuitive to use our platform,” Julius said. “And we need to realize gains beyond tech, [like] more involvement and input into what the platforms do and how they can help the entire company.”

Many of the platform engineering thought leaders The New Stack has interviewed have talked about the importance of using the platform as a single pane of glass to create a common language between business and engineering. This helps business understand the value of the big cost center that is engineering, while engineering can better connect their work to driving real business value to end customers. Julius’s team stands out in looking to leverage the platform to measure that effect. She said they are currently working “to incorporate how platforms impact our end-user strategy and experience,” connecting the DevEx to the DevOps.

They are also working out how to evangelize the platform internally. Like with all things, communication is key, including around onboarding and design-first thinking. They are customizing their messaging for different stakeholders. Julius noted they all have to get comfortable repeating themselves to not get lost in the email and Slack cacophony. The platform team is also considering adopting a tool like Backstage to help facilitate that internal product marketing and to, as she said, “bring it all together.”

All this feeds into a continued highlighting of performance, security and reliability gains.

Julius next to their playbook: Identity; start with the end state and vision. Principles & self-awareness, first team mindset, reputation & brand, execution and barriers, and an importance of failure

How Mature Is Your Platform?

Platform teams are cost centers, but, until recently, developer productivity wasn’t something that could be easily measured. This means that platform teams have had difficulty assessing their performance and impact. Last month, a new DevEx framework came out that examines developers’ flow state, feedback loops, and cognitive load.

The month before, the Syntasso team open-sourced their Platform Maturity Model which guides teams to answering the following questions:

  • How does the company value (and therefore fund) platform efforts?
  • What compels users to start, and be successful, using your platform?
  • How do users interact with and consume offerings on your platform?
  • How are requests and requirements identified and prioritized on your platform?
  • How does product engineering manage non-differentiating (and often internally common) tooling and infrastructure?
  • How does each business requirement (e.g. compliance or performance) get enabled by platform offerings?

Each of these questions has answers from Levels 1 through 4 to mark maturity of a platform team.

The Realtor.com platform team has created what it refers to as a playbook — an artifact that helps continuously build onto the organization’s Platform-as-a-Product culture. This includes their own maturity model. “It’s recognizing and reminding us that we don’t want to stop at a platform that just works, but we want to be seen for the good and invested in,” Julius said.

Pulling a metaphor for the company’s core industry, she compared a platform to a house. There are parts that you don’t really notice until something goes wrong like a window won’t open or the foundation is cracked. She explained that “Where we strive to mature as a platform when you notice the doors, you notice the windows, and they’re seen for the good.”

Next, the playbook features two decision-making frameworks to decide when to slow down or to speed up. She called them a flywheel to show off how they make decisions collaboratively and cross-functionally, “in a way that we can keep coming back and pointing at that decision as we progress.” They are:

  • Strategic technical initiative group (STIG) — to ensure technical decisions are made collaboratively and consider the future tech stack and feature development.
  • Cross-functional workshops — to collaborate and focus on both the Platform-as-a-Product and tech strategy.

Finally, the playbook centers on identity — which Julius said she could’ve given a whole talk around, it’s that essential to the Realtor.com product team. Identity leans into the importance of vision and purpose. A platform team always needs empathy, she argues, putting itself in its stakeholders’ shoes to better understand the technology and onboarding. It’s treating internal customers with the same level of care as external users.

Identity is all about understanding what a success story looks like and working backward to identify key aspects of that story, Julius explained, aligning that story with key decisions and remaining focused on the vision. It’s always about maintaining the organization’s reputation and grounding every decision in context.

“This is all about having the end state in mind, combining the fundamentals with your vision. It’s that compelling story of success.”

The post At PlatformCon: For Realtor.com, Success Is Driven by Stories appeared first on The New Stack.

]]>
‘Running Service’ Blueprint for a Kubernetes Developer Portal https://thenewstack.io/running-service-blueprint-for-a-kubernetes-developer-portal/ Wed, 07 Jun 2023 15:30:41 +0000 https://thenewstack.io/?p=22709819

Internal developer portals exist to provide developers with a product-like experience that’s free of cognitive load, allowing developers to stay

The post ‘Running Service’ Blueprint for a Kubernetes Developer Portal appeared first on The New Stack.

]]>

Internal developer portals exist to provide developers with a product-like experience that’s free of cognitive load, allowing developers to stay in the flow and be productive. They are set up by platform engineering teams to help developers serve themselves within guardrails and internal quality standards.

With portals, developers can simply and easily set up an ephemeral environment, restart a Kubernetes cluster, redeploy an image tag or scaffold a microservice. Platform engineering will make those actions reusable in the platform, and the internal developer portal will act as the interface to the platform and then reflect the changes in the software catalog.

But internal developer portals are more than loosely coupled product-like user interfaces that make developer lives easier. The internal developer portal also has a valuable software catalog that includes everything application-related in your engineering, from CI/CD metadata through cloud resources, Kubernetes, services and more.

The value of the software catalog is much greater than the metadata it contains (which is pretty neat, too) and goes way beyond showing who owns a service or where its logs are. In addition to being a single source of truth, its value comes from the way it provides context, especially in case of runtime data. It can quickly answer questions such as, “What is the current running version of service x in environment y?” even in cases that contain feature flags, canary or blue/green deployments.

Context and runtime data are the focus of this article. We will provide a detailed example of an internal developer portal for Kubernetes objects. We will then show the power of the software catalog and the fact that it can support workflow automation — anything from time to live (TTL) termination through service locking, triggering automated actions when services degrade, etc. — as a result of its combination of metadata and runtime data.

Spotify’s Backstage C4 Model for Internal Developer Portals

Software catalogs need a data model, and before you begin, you need to define it. It’s nothing too complex, but you do need a schema identifying what needs to be inside your software catalog. Software catalogs need to be unopinionated and completely flexible, so the best option is to let you define the data model yourself.

In Port, the schema for a type of entity (let’s say a K8s cluster) is called Blueprint. The actual entity (the actual cluster in this case) is called an entity. In Spotify’s backstage, the Blueprint is called “kind.”

Backstage, a leading open source internal developer portal and the third most popular Cloud Native Computing Foundation (CNCF) project, recommends beginning with a certain data model consisting of six blueprints (or kinds).

  • Component
  • API
  • Resource
  • System
  • Domain
  • Group

As Spotify’s senior engineer Renato Kalman and staff engineer Johan Wallin explain here, in designing Backstage they had a software visualization challenge: They needed a “standardized software metadata model to create a common language for communicating software architecture.” What they came up with was the C4 model. You can see an example of a  Backstage C4 model here.

But this data model misses one point: the “running service” blueprint.

What Is a Running Service?

Your code is not your app. The code that lives in your repo or in a container image isn’t the app. In real life, your app exists on an environment and serves something (api/other services/users) within an ecosystem of tools and dependencies. It behaves differently depending on where it is.

The running-service blueprint, or as we sometimes call it “service in environment,” reflects the fact that the a single “service” is usually deployed to many different environments. Services can live in a variety of environments, staging, development, production. Services can also live in many different customer environments, especially in the case of single-tenant architectures.

This simple fact that the service lives in many different environments is reflected by the idea of the “running service” blueprint in Port. The “running service” entity lets us see the service “in the wild” — in the specific environment it actually lives in. Only this provides us with the correct and actionable context to understand what is going on.

‍ Sticking to a static software catalog with a static data model that only includes metadata and not runtime data doesn’t provide the context we need. Insights exist only if you look at the real instance of the running microservice.

A Kubernetes Internal Developer Portal: The ‘Running Service’ Blueprint

Some argue that the growth of Kubernetes is one of the core drivers behind platform engineering. Kubernetes complexity, the expertise required of its practitioners and the recent movement of many developers to cloud native development all created increased load and friction between developers and DevOps.

Internal developer portals abstract Kubernetes away for developers. They let developers understand Kubernetes by showing them the relevant data, in context. They also support developer self-service actions. It’s important to ensure that these Kubernetes internal developer platforms include:

  • All Kubernetes objects in the software catalog, not just microservices
  • Multicluster support
  • CRD support

Let’s look at how to set up blueprints (the data model) for a Kubernetes internal developer portal and then at how and when we include the running service blueprint for Kubernetes.

This is the basic set of blueprints for Kubernetes:

Workload is the “running service” for Kubernetes. It is a generic name for stateful sets, deployments, daemon sets and any other workload running in the cluster.

  • A cluster represents a Kubernetes cluster in the infrastructure, providing the high-level connection between the different objects in the Kubernetes cluster.
  • A node is a server that hosts and provides the runtime for the different applications and microservices in the Kubernetes cluster.
  • A namespace is meant to group together many resources inside the same Kubernetes cluster, giving you the option to view how a complete environment hosted from the same Kubernetes cluster is connected.
  • The workload is meant to be the focal point that provides the most relevant context to a developer about how their app is doing. The workload entity provides the developer with an abstract view of their different workloads. They can see the current status of the workload, such as instance count and health. By going upstream in the dependency tree, the developer can see what other applications and microservices are running next to their own workload, letting the developer understand if there are any connectivity or functionality issues.
  • A pod is an instance of the workload, giving us visibility into the health of the pieces that make up the complete workload, as well as the ability to understand if there are any specific issues in the availability of the service provided by the workload.

You Should Begin Using a Running Service or Workload Blueprint

We’ve seen that the runtime blueprint, regardless of whether we call it “running service,” “workload” or even the literal “service in environment,” is useful. It reflects the reality in which a single service usually exists in several environments at the same time, such as development, staging, etc. It can also be deployed in many different customer environments. The running service provides the runtime data so we can understand the service in the context of its environment and deployment, as well as its real-time information, from uptime to status.

You can use Port for free at getport.io, or check a fully populated Port demo here.

The post ‘Running Service’ Blueprint for a Kubernetes Developer Portal appeared first on The New Stack.

]]>
Building GPT Applications on Open Source Stack LangChain https://thenewstack.io/building-gpt-applications-on-open-source-stack-langchain/ Wed, 07 Jun 2023 13:58:49 +0000 https://thenewstack.io/?p=22710146

This is the first of two articles. Today, we see great eagerness to harness the power of generative pre-trained transformer

The post Building GPT Applications on Open Source Stack LangChain appeared first on The New Stack.

]]>

This is the first of two articles.

Today, we see great eagerness to harness the power of generative pre-trained transformer (GPT) models and build intelligent and interactive applications. Fortunately, with the availability of open source tools and frameworks, like LangChain, developers can now leverage the benefits of GPT models in their projects. LangChain is a software development framework designed to simplify the creation of applications using large language models (LLMs). In this first article, we’ll explore three essential points that developers should consider when building GPT applications on the open source stack provided by LangChain. In the second article, we’ll work through a code example using LangChain to demonstrate its power and ease of use.

Quality Data and Diverse Training

Building successful GPT applications depends upon the quality and diversity of the training data. GPT models rely heavily on large-scale datasets to learn patterns, understand context and generate meaningful outputs. When working with LangChain, developers must therefore prioritize the data they use for training. Consider the following three points to ensure data quality and diversity.

Data Collection Strategy

Define a comprehensive data collection strategy tailored to the application’s specific domain and use case. Evaluate available datasets, explore domain-specific sources and consider incorporating user-generated data for a more diverse and contextual training experience.

Data Pre-Processing

Dedicate time and resources to pre-process the data. This will improve its quality that, in turn, enhances the model’s performance. Cleaning the data, removing noise, handling duplicates and normalizing the format are essential well-known pre-processing tasks. Use utilities for data pre-processing, simplifying the transformation of raw data into a suitable format for GPT model training.

Ethical Considerations

There may be potential biases and ethical concerns within the data. GPT models have been known to amplify existing biases present in the training data. Therefore, regularly evaluate and address biases to ensure the GPT application is fair, inclusive and respects user diversity.

Fine-Tuning and Model Optimization

A pre-trained GPT model provides a powerful starting point, but fine-tuning is crucial to make it more contextually relevant and tailored to specific applications. Developers can employ various techniques to optimize GPT models and improve their performance. Consider the following three points for fine-tuning and model optimization.

Task-Specific Data

Gather task-specific data that aligns with the application’s objectives. Fine-tuning GPT models on relevant data helps them understand the specific nuances and vocabulary of the application’s domain, leading to more accurate and meaningful outputs.

Hyperparameter Tuning

Experiment with different hyperparameter settings during the fine-tuning process. Adjusting hyperparameters such as learning rates, batch sizes and regularization techniques can significantly affect the model’s performance. Use tuning capabilities to iterate and find the optimal set of hyperparameters for the GPT application.

Iterative Feedback Loop

Continuously evaluate and refine the GPT application through an iterative feedback loop. This can include collecting user feedback, monitoring the application’s performance and incorporating improvements based on user interactions. Over time, this iterative approach helps maintain and enhance the application’s accuracy, relevance and user satisfaction.

User Experience and Deployment Considerations

Developers should not only focus on the underlying GPT models, but also on creating a seamless and engaging user experience for their applications. Additionally, deployment considerations play a vital role in ensuring smooth and efficient operation. Consider the following three points for user experience and deployment.

Prompt Design and Context Management

Craft meaningful and contextually appropriate prompts to guide user interactions with the GPT application. Provide clear instructions, set user expectations and enable users to customize and control the generated outputs. Effective prompt design contributes to a better user experience.

Scalable Deployment

Consider deployment strategies that ensure the scalability and efficiency of the GPT application. Use cloud services, containerization and serverless architectures to effectively handle varying workloads and user demands.

Continuous Monitoring

Implement a robust monitoring system to track the performance and usage patterns of the GPT application. Monitor resource utilization, response times and user feedback to identify potential bottlenecks and areas for improvement.

Summary

By considering these three key aspects — quality data and diverse training, fine-tuning and model optimization and user experience and deployment considerations — developers can build powerful GPT applications on the open source stack provided by LangChain. In an upcoming article, I’ll start exploring the potential of GPT models and LangChain through a worked example. I will also host a workshop on June 22 during which I will go through building a ChatGPT application using LangChain. You can sign up here.

The post Building GPT Applications on Open Source Stack LangChain appeared first on The New Stack.

]]>
Can DevEx Metrics Drive Developer Productivity? https://thenewstack.io/can-devex-metrics-drive-developer-productivity/ Wed, 07 Jun 2023 10:00:00 +0000 https://thenewstack.io/?p=22710223

Developer experience, as it centers on human beings, is inherently sociotechnical. Yet, much of the work of “DevEx” and developer

The post Can DevEx Metrics Drive Developer Productivity? appeared first on The New Stack.

]]>

Developer experience, as it centers on human beings, is inherently sociotechnical. Yet, much of the work of “DevEx” and developer productivity focuses solely on the technical — despite the long-held truth that happy workers are more productive. Technical leadership typically concentrates on measuring the output of developers or the time it takes for them to complete tasks — which makes for a whole lot of story points, and not a lot of influenced change.

Last month, a research paper entitled “DevEx: What Actually Drives Productivity” went viral around the software consultancy world. It outlines an approach to understanding DevEx, as well as builds on a previously published actionable framework that combines developer feedback with that data from engineering systems.

Neither paper provides a secret formula but they aim to offer organizations potential areas to focus their measurements and improvements on. After all, developer experience and software delivery as a whole is dependent on factors at the individual, team and organizational levels.

Especially during a time of trying to do more with less, gaining insights into getting more out of the significant engineering cost center is a valuable endeavor. Here’s how.

What Is DevEx and How Can You Measure It?

“Developer productivity is more important than ever. I mean, everyone has been saying that forever, but companies right now are really focused on efficiency and doing more with the developers they have,” Abi Noda, CEO and co-founder of DX developer insights platform, told The New Stack.

At the same time, software development is evermore complex so that, “with all the different tools and technologies that developers use today, just doing your job is getting harder and harder,” he continued. “And then there’s also the shift to hybrid and remote work. People are trying to understand how does that affect developers and/or the productivity of their workforces.” This trifecta makes it the perfect time to dive into developer productivity and improving developer experience.

To the authors of this white paper, “Developer experience focuses on the lived experience of developers and the points of friction they encounter in their everyday work.” It’s not just about productivity, but increased efficiency, product quality and employee retention. DevEx has also been defined as encompassing how developers feel about, think about and value their work — not exactly easily measurable subjects, which may be why, unfortunately, most companies aren’t looking to measure them.

The authors coalesce around three dimensions of developer experience:

  • Feedback loops – waiting time for developers to get done their work and how streamlined teams can shorten that time
  • Cognitive load – in the ever-growing complexity of the cloud native world, organizations should look to limit hurdles to delivering value to customers
  • Flow state – when developers “get in the zone,” with limited distractions — meetings, unplanned work, ad-hoc requests for help — they feel energized by a greater sense of purpose

These form the three angles of a triangle, all feeding into each other.

Early on, the paper cites a 2020 McKinsey study which revealed that companies with better work environments for their developers boasted dramatically increased developer velocity, which in turn correlated to four to five times the revenue of their competitors. It’s therefore presumed that the above three dimensions are highly influential to velocity.

What influences that developer experience comes down to 25 sociotechnical factors — including interruptions and friction from tools or processes — which are evaluated by survey responses. This data is then combined with existing data from tools, like issue trackers and CI/CD pipelines, as well as the traditional KPIs and OKRs. Another powerful DevEx metric, particularly during these leaner times, is Knowledge Discovery Efficiency or KEDE, which leverages repositories to identify developer knowledge gaps.

No matter which measurements work for your organization, it should be a sociotechnical blend of perceptual measurements — like how developers feel, as ascertained via semi-frequent surveys — and more concrete developer workflows.

Focus on the Person or the Team?

Developer experience is highly personal and contextually dependent, Noda said, which is why the framework is unique in focusing heavily on the individual. But that creates a challenge around how to measure the individual but work to improve the collective experience.

Indeed, the paper calls out surveys as an important “fast and accurate” measurement tool. After these carefully designed surveys — asking things like “Based on X, how likely are you to…” — are regularly run, break down results and Net Promoter Scores (NPS) by team and developer persona, advises the paper. Noda clarified in our interview that these surveys should be anonymized or aggregated. It remains unclear how anonymous it can be on the average “two-pizza team” of five to nine people, and if it really can be individually actionable to aggregate results.

A key measurement of developer experience is how good you perceive you are at your job — because feeling good at your job is highly motivational and signals both a lessened cognitive load and an optimized flow state. However this measurement brings its own slew of implicit biases that increase with intersections across demographic, role and experience.

After all, imposter syndrome is more likely if you are new to the tech industry or role and/or if you are from a marginalized group. Both of those circumstances would also make you feel less safe to reply honestly about flaws or hurdles. Add to all this, we are still in a time of tech layoffs, where morale may be down, but people may feel less safe to speak up. On the other hand, optimization, particularly for the individual’s flow state, would likely increase inclusion of neurodivergent developers.

All of these concerns should be considered within your DevEx survey design. The 2021 paper of the same authors “Actionable Framework for Understanding and Improving Developer Experience” is a more in-depth work following interviews with 21 developers and developer leads — though, it notes, despite efforts, it included only one woman. This paper cites psychological safety as the single most important factor affecting developer experience.

Psychological safety in this instance could be defined as feeling safe to speak frankly about your experience. “On teams with good psychological safety and culture, developers are more willing to voice and tackle problems in order to continuously improve developer experience,” the paper reads, while unsafer culture discourages developers from speaking up or trying to make proactive improvements.

Focus on Flow

Embracing your flow doesn’t just mean going fast — it’s as much about reducing friction and frustration for more focused and happy developers.

“Flow metrics is about finding a sustainable pace that allows you to keep going infinitely,” Sophia Ashley, scrum master at AND Digital, told The New Stack. Pointing to how flow metrics are often misunderstood, she said, “It’s not necessarily about speeding up. Yes, they can help you increase your velocity, but it’s about finding a pace that works for you,” making lead time also an individual metric. Once you’ve reached repeated consistency, she explained, you can then look to increase your pace, but at your own level of sustainability — endless growth is simply unsustainable.

In the capitalistic world of move fast and break things, she said that this more controlled growth can be a tough pill to swallow, but it falls in line with the change in an industry that’s embracing responsibility and environmental, social and governance or ESG goals. And it helps reduce the developer burnout that’s rampant in this industry.

Following the DevOps philosophy, for Ashley, flow metrics are teaching your team how to deliver sustainably. “A lot of companies want to do big bang releases,” and things break, she said. It’s more sustainable to do small releases to “teach teams to undo.”

Prior to joining the tech industry in 2018, Ashley was a physical therapist, from which she draws a lot of comparison, including post-injury training. “If they don’t exercise consistently, they will be stuck with their broken hip forever.” On tech teams, she continued, “Whatever we do, we stay flexible and we make changes that we can revert if needed, and that allows us ultimately to have this flow enabled that we don’t add damage to our company or environment.”

Progressive delivery is a series of technological solutions to help decrease cognitive load, allowing teams to roll back changes more easily. Observability and monitoring are also essential so bugs and causes of outages can be uncovered much faster.

Reflecting on the DevEx metrics triangle, Ashley said that it all comes back to that flow state. “Just being able to utilize your time well and keep working. That’s what developers want. Not being interrupted — context switching wastes a lot of time,” especially when developers are responsible for knowing several layers of the stack or are halted waiting for pull requests to be approved. “Work with users to understand the problems,” she said to shorten your feedback loops. And make sure you’re managing the developer cognitive load because “context switching like multitasking is not as efficient as doing one thing at a time.”

With her work as a consultant, she continuously runs some of the pulse surveys mentioned in the paper, asking:

  • Are you happy within the team?
  • Are you satisfied in your role?
  • Do you think you provide value?
  • Do you feel you are productive?

Is DevEx Just DevOps for the Individual?

It’s hard not to compare this DevEx approach to other widespread practices in the tech industry like DevOps and platform engineering. In part that’s because Nicole Forsgren is a prominent co-author of both these papers and of Accelerate, which is considered an essential DevOps text. But also this DevEx paper echos back to the three goals of DevOps:

  • Shortening feedback loops with customers
  • Systems thinking and flow
  • Continuous experimentation and improvement

It’s just, while they both aim to increase the velocity of the software development lifecycle, DevOps focuses on the team while DevEx focuses on the individual. But, of course, optimizing for more developers to reach their flow states in turn should reduce the time to deliver value to customers. And by delivering value to customers faster, this in turn tightens feedback loops, reduces developer frustration and more regularly offers that dopamine boost of doing work that matters.

As established in Accelerate, DORA Metrics — deployment frequency, lead time for changes, mean time to recovery, and change failure rate — are as important than ever. DevEx just focuses on the individual’s contribution to these team, department or division metrics.

And then if you look at the next level up, the discipline of platform engineering observes and learns from the work of different teams to find behavioral patterns and especially blockers to the value flow chain. It aims to reduce, abstract and automate any demotivating, repetitive and non-differential work. It also further reduces context switching so developers stay focused on delivering value to the end users.

“Platform teams have to actually be understanding where the organization is at and what’s holding back productivity and make sure that they’re tackling those things and showing the impact of them by measuring and tying that back to the goals of the business,” Noda said. This is what distinguishes the platform teams that are adding value during economic downturn and the old-fashioned ones that just toss the platform over and are more likely to be cut right now.

Also, whether it’s borrowing developers, embedding within the app teams, or running lunch-and-learns and regular surveys, we know the biggest factor into the success of platform teams is reducing that feedback loop with internal developer customers, prioritizing them as your internal customers.

So as organizations look to increase developer productivity, at a time of likely reduced headcount, there could be a strong argument to examine the developer experience at three levels — individual, team and company-wide — to truly unlock the power of developer experience. And to run regular surveys that look to measure psychological safety, so the presence of problems is surfaced early and often at each tier.

The post Can DevEx Metrics Drive Developer Productivity? appeared first on The New Stack.

]]>
SRE vs. DevOps? Successful Platform Engineering Needs Both https://thenewstack.io/sre-vs-devops-successful-platform-engineering-needs-both/ Tue, 06 Jun 2023 17:53:56 +0000 https://thenewstack.io/?p=22710173

When talking about cloud native computing and digital transformation, two industry terms frequently appear: site reliability engineering (SRE) and DevOps.

The post SRE vs. DevOps? Successful Platform Engineering Needs Both appeared first on The New Stack.

]]>

When talking about cloud native computing and digital transformation, two industry terms frequently appear: site reliability engineering (SRE) and DevOps. Often, they’re mentioned in opposition: SRE versus DevOps. But that is wrong.

To succeed in the cloud native world, organizations need both DevOps and SRE. Moreover, teams need a third element to assure transformation success as they move into the cloud native world: a platform engineering team.

That makes it important to understand the definition of each term, the distinctions between them, what they do and how they benefit business, as well as why organizations need all three to succeed.

What Is DevOps?

DevOps is a software methodology, but also an IT culture. It combines software development and IT operations to streamline software and services delivery with the objective of building software more efficiently, as well as harness automation as much as possible to drive faster deployment of higher-quality software. Its overall goal is to make system changes easier and rely on continuous improvement instead of massive improvement initiatives.

DevOps’ cultural implications come from its emphasis on enhanced collaboration and communication between different teams. Developers, operations staff, quality assurance (QA) professionals and security specialists all work together using automation tools to accelerate and standardize the development process. These teams also use CI/CD techniques to test, integrate and deploy software changes as quickly and reliably as possible.

What Problems Does DevOps Solve?

Legacy software development practices such as waterfall are typically quite slow and can cause conflicts between developers and operations teams. Prior to DevOps, the development team would already be working on a new project by the time operations completed QA and security checks. The organizational silos between development and operations discouraged collaboration to fix issues, instead promoting finger-pointing. This frustrated business clients and other stakeholders who were impatiently waiting for an application to move into production.

DevOps also solves the testing issue in traditional development environments. Without rigorous testing, software bugs can go undetected, which leads to unplanned downtime of critical production systems, user frustration and even lost revenue. With CI/CD, DevOps implements testing earlier, avoiding the last-minute rush to test quickly and push apps out the door.

Security is another critical issue. DevOps incorporates continuous security audits as an integral part of the development process to identify and address vulnerabilities before bad actors exploit them.

Benefits of DevOps

Some advantages of a DevOps culture include:

  • Faster time to market: DevOps enables organizations to bring new products and features to production faster through a streamlined development process and by eliminating bottlenecks.
  • Improved collaboration: Having teams working together helps to reduce silos and improve communication across the organization.
  • Better quality: With testing and deployment automation, DevOps can help to reduce the number of errors and improve the overall quality of the software.
  • Increased efficiency: Automation aids in velocity by reducing repetitive tasks and manual intervention.
  • Greater scalability: DevOps provides a framework to build scalable and resilient software capable of supporting rapidly growing businesses.

What Is SRE?

Site reliability engineering (SRE) is a discipline that applies software engineering to operations to build and maintain highly reliable and scalable applications. SRE started at Google but is now widely adopted throughout the technology industry.

Part of the SRE creed is that “every failure is an opportunity for learning” and thus engineers must find the problem’s contributing factors and make adjustments at the system level to ensure that particular issue doesn’t resurface.

What Problems Does SRE Solve?

First and foremost, SRE tries to reduce system outages and downtime by identifying and addressing issues quickly. With investigations and incident analyses, SRE teams contribute to the DevOps team’s ability to build and modify systems to be highly available and resilient by design.

SRE helps system performance to ensure that software in production meets all user needs, whether internal or external. The SRE team also monitors usage patterns and capacity to ensure that the IT environment can handle expected traffic, avoiding overloading and service disruption.

SRE teams collaborate closely with DevOps teams to confirm that issues are truly resolved. There is a constant feedback loop between SRE and DevOps to guarantee that flaws are fixed at the source and not just temporarily patched.

The Benefits of SRE

Beyond improving systems reliability — its primary objective — SRE teams help design operable systems that are less likely to fail or experience unplanned downtime. SRE promotes:

  • Faster incident resolution: With a data-driven approach to issue identification, SRE teams can address them quickly and reduce the time to detect and resolve incidents.
  • Efficient resource utilization: SRE teams optimize resource usage to ensure that systems can scale efficiently without requiring significant additional resources.
  • Improved collaboration: Close work with development teams ensures that software is designed with reliability in mind from the outset.
  • Greater automation: SRE teams use automation to reduce the risk of human error and increase efficiency, which frees up both DevOps and SRE teams’ time for more strategic work.

What Is Platform Engineering?

Platform engineering is the practice of building and maintaining an internal software platform — consisting of tools, services, and infrastructure — that lets developers effectively and efficiently build, deploy, operate and observe applications. Platform engineers’ objective is to enable developers to focus on writing code rather than infrastructure issues.

Many platform engineering teams designate “golden paths for application development in pursuit of maximum reliability, quality and developer productivity. Golden paths are pre-architected and supported approaches to build and deploy software. If development teams use golden paths, then the platform engineering team supports production, and developers don’t have to learn all the underlying technology. This dramatically accelerates an application’s time to market.

Platform engineers monitor developer efficiency for the entire software development life cycle, from source code to production, to ensure that developers have the required tools and support to produce the highest-quality applications.

What Problems Does Platform Engineering Solve?

Platform engineering directly addresses the overall developer experience. Developers are getting more frustrated. According to a recent survey, DevOps team spend, on average, more than 15 hours each week on activities other than coding.

This includes internal tool maintenance, development environment setup and pipeline debugging. The cost of this is astronomical. In the United States alone, businesses are losing up to $61 billion annually, according to Garden.io.

The complexity of managing today’s cloud native applications drains DevOps teams. Building and operating modern applications requires significant amounts of infrastructure and an entire portfolio of diverse tools. When individual developers or teams choose to use different tools and processes to work on an application, this tooling inconsistency and incompatibility causes delays and errors. To overcome this, platform engineering teams provide a standardized set of tools and infrastructure that all project developers can use to build and deploy the app more easily.

Additionally, scaling applications is difficult and time-consuming, especially when traffic and usage patterns change over time. Platform engineering teams address this with their golden paths — or environments designed to scale quickly and easily — and logical application configuration.

Platform engineering also helps with reliability. Development teams that use a set of shared tools and infrastructure tested for interoperability and designed for reliability and availability make more reliable software.

It also allows developers to access the tools they need themselves. Instead of using an IT ticketing system or having a conversation about creating a new database, a developer can simply spin it up in a user interface and know the configuration of any alerts, replications and operating parameters.

Finally, platform engineering addresses the high cost of building applications the traditional way, in which the development team purchases a broad range of tools and environments, frequently with overlapping functionality. Through standardization and automation, platform engineering minimizes these costs.

The Benefits of Platform Engineering

A well-designed development platform with tested and optimized golden paths helps developers build and deploy applications faster with pre-built components and infrastructure. This reduces the amount of required time and effort to build and configure these components from scratch. Other benefits include:

  • Standardization and consistency: Platform engineering delivers a standard set of tools and infrastructure to ensure that all applications built on the platform are consistent and meet the same quality standards.
  • Scalability and flexibility: Environments provided by the platform engineering team enable developers to deploy and scale applications quickly and easily.
  • Reduced operational costs: With task automation for deployment, monitoring and scaling, platform engineering frees up DevOps teams to focus on more strategic work.
  • Improved application reliability and availability: A platform engineering team provides a set of shared tools and infrastructure specifically designed for high uptime and 24/7 access.

Puppet’s 2023 State of DevOps Report found that platform engineering multiplies the chances of DevOps success.

What Are the Differences Between DevOps, SRE and Platform Engineering?

Organizations venturing into the cloud native world must do things differently to get transformative results; cloud native problems require cloud native solutions.

The first step is usually to adopt a DevOps culture if they don’t already have one. But DevOps needs support to make the transition and operate in cloud native environments. SRE and platform engineering teams provide such support.

It might be possible to get by with just two — or even one of these teams — but an organization aiming to modernize some or all of their workloads to cloud native should consider establishing all three teams.

  • DevOps: Responsible for the complete life cycle of the apps, from source to production and modifies/enhances apps post-production.
  • SRE: Primarily focused on application scalability, reliability, availability and observability. This team typically acts in crisis management mode when the performance or availability of an app is at risk.
  • Platform engineering: The definition is still evolving, but platform engineering’s role of setting standard tools and processes to speed development is acknowledged as an extraordinarily helpful bridge for DevOps to make the transition from monolithic to microservices-based cloud native computing.

Each team has a specific role and objectives, yet all three work together best to ensure the business can deliver cloud native applications and environments according to industry best practices.

How Chronosphere Supports All Three

The addition of DevOps, SRE and platform engineering teams boosts cloud native adoption and succeeds when these teams have complete visibility into their cloud native apps and cloud environments. This comes from a new generation of monitoring and observability solutions.

Cloud-hosted monitoring and application performance monitoring (APM) were born in the pre-cloud native world, one with very different assumptions. It’s no wonder they struggle with cloud native architectures. A cloud native observability solution like Chronosphere that is architected for modern digital business and observability can tie all three teams together.

With cloud native monitoring and observability, increased visibility into overall metrics usage and the power to set quotas for quickly growing services, Chronosphere gives organizations the flexibility and control they need over the entire application life cycle.

The post SRE vs. DevOps? Successful Platform Engineering Needs Both appeared first on The New Stack.

]]>
The Art of Platform Marketing: You’ve Gotta Sell It https://thenewstack.io/the-art-of-platform-marketing-youve-gotta-sell-it/ Tue, 06 Jun 2023 13:05:41 +0000 https://thenewstack.io/?p=22710139

“How do we get developers to actually use our platform?” This is a question I’m often asked. A good first

The post The Art of Platform Marketing: You’ve Gotta Sell It appeared first on The New Stack.

]]>

“How do we get developers to actually use our platform?”

This is a question I’m often asked. A good first step is to make sure you take a product management approach and build an app platform that developers actually want to use, making sure that the golden path to production is not only useful but, well, fun. However, there is a second step that is often overlooked and misunderstood by platform teams: good, old-fashioned marketing. Once you have your platform set up, you have to build up what is essentially a full marketing plan to drive interest in and use of that platform. This includes not only brand, messaging and positioning and campaigns for outreach, but also platform advocacy.

What Platform Marketing Does

Platform marketing is used to drive awareness, trust and interest, but it also gives you an opportunity to get product management feedback about your platform. That last part is one of the underappreciated parts of advocacy (or “developer relations” as it’s sometimes called). When you’re developing in a product mindset, as most platform teams do, you’ll appreciate as much feedback as you can get from your customers — your developers. When infrastructure teams tell me they’ve built a platform or a Kubernetes cloud for developers but that developers aren’t using it, it’s usually because they need to do much more platform marketing.

Marketing doesn’t come easy to infrastructure people. It’s an off-putting word, perhaps only rivaled by “enterprise sales rep.” As ever with eye-roll-inducing phrases, what people actually dislike is bad, boring and useless marketing. At large organizations, most of the successful platform teams I talk with pay close attention to marketing, to good marketing. The likes of Mercedes-Benz, JPMorgan Chase, Duke Energy, The Home Depot, BT, the U.S. Air Force and Army and many others start their platform marketing plans from day one. And, in fact, marketing is a key part of scaling and sustaining how these organizations improve the way they make software.

I’ll be covering platform marketing as one of the “7 lessons from 7 years of running platforms” in my upcoming talk at PlatformCon, being held June 8 and 9. In the meantime, here’s a preview of one of those seven lessons: marketing and advocacy.

Brand

“Do you have a T-shirt yet?” my colleague DaShaun Carter likes to ask platform teams. This can seem like a flippant question, but it gets to an important part of platform marketing: establishing a brand. You need a name for your platform and the philosophy of software it supports. For example, the U.S. Air Force uses the brand Kessel Run, and JPMorgan Chase has the Gaia brand.

A brand performs two functions.

First, it creates an identity and a definition of what exactly your platform is. People tend to identify with the tools they use. They’re Java developers, Rust developers, Linux administrators, they follow XP or they’re site reliability engineers (SREs) instead of “DevOps engineers,” and so forth. That identity creates affinity and attraction to the brand — in this case, your platform. In doing so, it creates a certain joy in using the platforms and a passion for the platform.

Second, a brand helps define what your unique methodology and philosophy is. No matter if you’re doing agile, following DevOps principles, practicing SRE or sorting out what “platform engineering” means this quarter, you’ll need to adapt those methodologies to your organization’s unique needs. The sages of these methodologies aren’t so fond of cafeteria DevOps, where you just pick and choose the practices you want to use. However, in many large organizations, to get better, you need to make compromises and adapt stringent methodology principles.

Using your own name helps you take ownership of the methodology you’re putting together and change it as you learn what works. It’s a good time saver too. As one executive told me on a long elevator ride a few years back, don’t ever use the word “agile” when you’re doing agile. The first thing that’ll happen is that someone will start complaining that you’re not doing real agile, that you’re doing it wrong. And then you get stuck in the narcissism of a small-differences black hole instead of just getting on with the work.

The Book

You’re certainly going to need a manual, training, documentation and the usual three-ring binder material. But you’ll also want to write up the thinking that’s behind the brand. You need to codify and distribute your intentions, goals and principles. This is something more tactical, more usable than vision and strategy.

The exact content of The Book will vary, so it’s good to look at examples for inspiration. While it’s just a narrow slice of what would be in The Book, the UK Digital Service has a great list of design principles. You can see how we think about software at VMware Tanzu Labs in things like our FAQ and books like “Radically Collaborative Patterns for Software Makers.”

As you scale your platform to hundreds, then thousands of developers, this ongoing documentation of your thinking will be critical. It’s more than just tech documentation, it’s documenting the culture that your platform is built to support. This book will also help the platform team remember the point of the platform and their work as well. For example, to get the organization focused on building well-designed software, using lean-design techniques, deploying weekly, etc.

Platform Advocacy

Finally, the successful platform teams I talk with have very active platform advocacy. This means having at least one person working full time to just talk with, work with and listen to the people who use your platforms, usually developers. The role of “developer advocate” is pretty well understood by us vendors and cloud providers. Developer advocates love talking to people, and we also love talking about our craft. This means you can find out how it’s done easily by just asking us.

You’ll probably start with just one platform advocate who visits with developer teams throughout your organization listening to what these teams do, teaching them how to use the platform and associated methodologies and listening to their feedback. The advocate acts as a spreader of your platform, a booster and an explainer. Also, often overlooked, the advocate takes feedback from developers and others back to the platform team. They advocate for both the platform team and for the platform users.

As your platform and overall software transformation scale, you’ll add more advocates. Like JPMorgan Chase, you might even have a whole team of platform advocates. The Cloud Foundry platform team at Mercedes-Benz provides training, systematic feedback collection, quarterly community updates and numerous other community management functions that you’d expect an advocate to help with.

One of the common, maybe required, practices the advocacy team follows is holding quarterly internal conferences. These are actual, in-person conferences, often rotating through different regions and office locations with an online component. At these conferences, your platform team and executive sponsors talk a little bit about the platform, but you mostly get your customers — developer teams — to present and talk about the projects they’ve worked on. This serves two functions: training and, that’s right, marketing.

The marketing you’re taking advantage of at internal conferences is the most coveted of all marketing treasures: word of mouth. Having developers tell other developers that your platform is good, great even, will be the best way to get developers to use your platform, and use it well.

Start Platform Marketing on Day One

In addition to those important aspects of platform marketing, you’ll also need to do some marketing fundamentals, like producing content and documentation and working with product management to understand your customers and go to where they are, so to speak.

I haven’t seen many platform teams (or any, perhaps) that have scaled and sustained their developer platform without platform marketing. You’ve got to start thinking about marketing from day one, assigning at least one full-time advocate to start that work of creating a brand name and documenting your ongoing platform philosophy and principles. As with developer advocacy, you don’t need to spend time reinventing the wheel: Tech marketing is a well-understood set of practices. The trick is to actually do it.

If you want to hear the other six lessons of scaling and sustaining platforms in large organizations, check out my full talk at PlatformCon, ”7 lessons from 7 years of running platforms.”

The post The Art of Platform Marketing: You’ve Gotta Sell It appeared first on The New Stack.

]]>
7 Core Elements of an Internal Developer Platform https://thenewstack.io/7-core-elements-of-an-internal-developer-platform/ Mon, 05 Jun 2023 13:41:04 +0000 https://thenewstack.io/?p=22709978

What does it take to build an internal developer platform? What are the tools and platforms that can make it

The post 7 Core Elements of an Internal Developer Platform appeared first on The New Stack.

]]>

What does it take to build an internal developer platform? What are the tools and platforms that can make it work? This post will discuss the architecture and tools required to stand up a fully operational internal developer platform. To see the actual steps of setting up the platform, watch this video.

Why Do We Want an Internal Developer Platform?

Platform engineering’s overarching goal is to drive developer autonomy. If a developer needs a database, there should be a mechanism to get it, no matter if that person is a database administrator or a Node.js developer. If a developer needs to manage an application in Kubernetes, that person doesn’t need to spend years trying to understand how Kubernetes works. All these actions should be simple to accomplish.

A developer should be able to accomplish what they need by defining a simple manifest or using a web UI. We want to enable all developers to consume services that will help them get what they need. Experts (platform engineers) will create those services in the internal developer portal, and users will consume them in its graphical user interface or by writing manifests directly and pushing them to git.

The High-Level Design of an Internal Developer Platform — 7 Core Elements

An internal developer platform needs several parts to become fully operational. For each part we will recommend a tool, but they can be exchanged with similar tools. The core idea is to map out the functionalities needed to build the platform:

  • A control plane: The platform needs a control plane that will be in charge of managing all the resources, no matter if they are applications running in a Kubernetes cluster or elsewhere, or if the infrastructure or services are in Amazon Web Services (AWS) , Azure, Google Cloud or anywhere else. Our recommended tool here is Crossplane.
  • A control plane interface: This will enable everyone to interact with the control plane and manage resources at the right level of abstraction. Our recommended tool here is Crossplane Compositions.
  • Git: The desired states will be stored in git, so we’ll have to add a GitOps tool into the mix. Its job will be to synchronize whatever we put in git with the control plane cluster. Our recommended tool here is Argo CD.
  • Database and schema management: Given that state is inevitable, we will need to have databases as well. Those databases will be managed by the control plane but to work well, we will also need a way to manage schemas inside those databases. Our recommended tool here is SchemaHero.
  • Secrets manager: For any confidential information that we cannot store in git, we’ll need a way to manage secrets in a secrets manager. Those secrets can be in any secrets manager. Our recommended tool to pull secrets from there is External Secrets Operator (ESO).
  • An internal developer portal/ graphical user interface: In case users don’t want to push manifests directly to git, we should provide them with a user interface that will enable them to see what’s running as well as to execute processes that will create new resources and store them in git. Our recommended tool here is Port.
  • CI/CD pipelines: Finally we will need pipelines to execute one-shot actions like the creation of new repositories based on templates, building images with new release changes to manifests and so on. Our recommended tool here is GitHub Actions.

The setup will require a few additional tools, but the list above is a must.

The diagram below shows how each of the elements interacts with each other. You can use it as a reference as you read through this article.

Let’s examine the role of each layer in the setup:

Control Plane

Let’s talk about control planes: We need a single API acting as an entry point. This is the main point of interaction for the internal developer platform. In turn, it will manage resources no matter where they are. We can use Crossplane with providers, which enables us to manage not only Kubernetes but also AWS, Google Cloud, Azure or other types of resources. We will use Kubectl to get custom resource definitions (CRDs) that will create deployments, services and manage databases in hyperscaler clusters, etc.

However, this alone isn’t enough for a full-fledged internal developer platform. An application can easily consist of dozens of resources. Infrastructure can be much more complicated than that. Most importantly, all those low-level resources are not at the right levels of abstraction for people who are not Kubernetes or AWS or Google Cloud specialists. We need something that is more user-friendly.

A User-Friendly Interface for the Control Plane

The control plane interface can act as the platform API when you’re 100% GitOps. It shouldn’t be confused with the internal developer portal, which acts as the graphical user interface. We can use Crossplane Compositions for that.

What is the right level of abstraction for the users of the platform we’re building? The rule is that we should hide, or abstract, anything that people don’t really care about when they use the internal developer platform. For instance, they probably don’t care about subnets or database storage. The right level of abstraction depends on the actual use of the platform and will differ from one organization to another. It’s up to you to discover how to best serve your customers and everyone else in your organization.

Crossplane Compositions enables us to create abstractions that can simplify the management of different kinds of applications. Next, we probably do not want anyone to interact directly with the cluster or the control plane. Instead of people sending requests directly to the control plane, they should be storing their desired states in git.

Synchronize from Git with GitOps

Changing the state of resources by directly communicating with the control plane should not be allowed, since no one will know who changed what and when. Instead, we should push the desired state into git and, optionally, do reviews through pull requests. If we plug GitOps tools into the platform, the desired state will be synchronized with the control plane, which in turn will convert it into the actual state.

This is a safer approach as it doesn’t allow direct access to the control plane and also keeps track of the desired state. I recommend doing this with Argo CD, but Flux and other solutions are just as good.

Schema Management

Databases need schemas. They differ from one application to another. To complete our internal developer platform, we need to figure out how to manage schemas, preferably as part of application definitions stored in git. There are many ways to manage schemas, but only a few enable us to specify them in a way that fits into the git model. The complication is that GitOps tools work only with Kubernetes resources, and that means that schemas should be defined as Kubernetes resources as well. This requires us to extend the Kubernetes API with CRDs that will enable us to define schemas as Kubernetes resources. I recommend using SchemaHero for that.

Secret Management

Some information shouldn’t be stored in git. Having confidential information such as passwords in git could easily result in a breach. Instead, we might want to store those in a secret manager like HashiCorp Vault or a solution provided by whichever hyperscaler you’re using. Still, those secrets need to reach the control plane so that processes inside it can authenticate with external APIs or access services, for example, databases. I recommend using External Secrets Operator (ESO) for that.

Internal Developer Portal — Graphical User Interface

The internal developer platform needs a user interface to sit on top of everything we’ve built so far. This is the internal developer portal. It both provides a catalog of services people can

use as well as an interface for developers to perform the actions we want them to use autonomously. Specifically, we need a way to initialize a process that will create new repositories for applications, add sample code, provide manifests for the databases and other dependencies, create CICD pipelines, and so on and so forth.

For this setup we began with the Kubernetes catalog template from Port.

We will then add two additional blueprints that will be related to the cluster blueprint, Backend App and Environment:

Environment

Backend App

CI/CD Pipelines

Finally, we need pipelines. They are the last piece of the puzzle.

Even though we are using GitOps to synchronize the actual state into the desired state, we need pipelines for one-shot actions that should be executed only once for each commit. These could be steps to build binaries, run tests, build and push container images and so on.

The Internal Developer Platform in Action

From the user (developer) perspective, a new application can be created with a simple click on a button in a Web UI or by defining a very simple manifest and pushing it to git. After that, the same interface can be used to observe all the relevant information about that application and corresponding dependencies.

Behind the scenes, however, the flow would be as follows.

  1. The user interacts with a Web UI (Port) or directly with git. The job of the internal developer portal in this case is to trigger an action that will create all the necessary resources.
  2. Creating all the relevant resources is a job done by the pipeline such as GitHub Actions. In turn, it creates a new repository with all the relevant files, such as source code, pipelines, application manifests, etc.
  3. As a result of pushing changes to the application repository (either as a result of the previous action or, later on, by making changes to the code), an application-specific pipeline is triggered (GitHub Actions) which, as a minimum, builds a container image, pushes it to the image registry and updates the manifests in the management repo, which is monitored by GitOps tools like Argo CD or Flux.
  4. GitOps tools detect changes to the management repo and synchronize them with the resources in the control plane cluster.
  5. The resources in the control plane cluster are picked up by corresponding controllers (Crossplane), which in turn create application resources (in other Kubernetes clusters or as hyperscaler services like AWS Lambda, Azure Container Apps or Google Cloud Run) as well as dependent resources like databases (self-managed or as services in a hyperscaler).

The post 7 Core Elements of an Internal Developer Platform appeared first on The New Stack.

]]>
How to Host Your Own Platform as a Product Workshop https://thenewstack.io/how-to-host-your-own-platform-as-a-product-workshop/ Wed, 31 May 2023 16:09:43 +0000 https://thenewstack.io/?p=22709258

Often, when organizations start talking about platform engineering, it’s not necessarily about getting started. After all, most companies already have

The post How to Host Your Own Platform as a Product Workshop appeared first on The New Stack.

]]>

Often, when organizations start talking about platform engineering, it’s not necessarily about getting started. After all, most companies already have some form of platform. It just likely will have low adoption rates because it’s been a top-down initiative where operations is throwing something over the silo.

That should change when organizations adopt a Platform as a Product strategy. This practice embraces the socio-technical side of platform engineering that considers the intersection of people, processes and technology — not just the latter. This mindset shift is about focusing first not on building internal features, but on serving and engaging with your internal developer customers. Perhaps most importantly during a time of tech layoffs, this practice focuses on learning to solve complex problems using the platform technology — so not thinking tech first, but problem first.

Anna Ciula and Cansu Kavili Örnek consult with Red Hat customers to help them build fit-for-purpose platforms that embrace this platform engineering mindset. Red Hat’s Open Innovation Labs team has developed and open sourced a half-day Platform as a Product workshop. At WTF is SRE, the team’s engagement lead Ciula and architect Cavil Örnek reflected on their last two years leveraging this workshop to help teams get the most out of their platforms.

Platform as a Product Delivers DevOps

You might’ve read that DevOps is dead. While a catchy headline, platform engineering is seen as finally the delivery mechanism of DevOps.

“What I see is that platform engineering is the natural extension or natural next step to DevOps implementation — like the mature version of DevOps. Because the technology got super complicated and there’s this overhead, new things coming up every single day that developers need,” Kavali Örnek said. “Platform engineering is here to provide them with some common, reusable capabilities to take this cognitive load from them so that they can actually focus on what they like to do — the business applications, developing the software.”

With the DevOps is dead mindset, she fears, job titles will change, but the ways of working will stay the same. In order to address the challenges highlighted in Puppet’s State of Platform Engineering Report — including cycle time too slow, resistance to platform team adoption, and lack of communication — these teams must adopt a Platform as a Product mindset, a term coined by Team Topologies.

“Applying this modern product management mindset into the building of platforms,” she defined Platform as a Product, admitting this is a “big hard shift from focusing on the underlying technology of the platform, but, instead, focusing on what are actually user-developer needs, and building that platform around them.”

The Platform as a Product workshop is designed in response to common challenges Red Hat customers have shared, Ciula explained. “We saw the devs being utterly frustrated with the ops throwing all the obstacles under their legs. And then, ops being utterly frustrated with developers calling them stupid because they don’t know how to use the platform.” These reactions have all led to ops deferring to command-and-control mandates to use the platforms and, what she called, “the shadow IT rebels creating their own clusters somewhere on the credit card.”

Like all of platform engineering, this workshop looks to create a common language and get everybody on the same page, with all stakeholders buying into the benefits of a Platform as a Product mindset.

Wait, What Is a Platform?

Your Platform as a Product workshop is only useful if everyone can be useful to is at the table. This would be users and customers, but also who Ciula calls influencers like the compliance and security teams. As we wrote about before, platform engineering abstracts out what Syntasso’s Abigail Bangser calls “non-differentiating, but not unimportant work” so that app teams can focus on delivering business value faster.

After a brief introduction, this workshop has all those stakeholders prioritize your challenges to platform engineering success, including:

  • High developer cognitive load
  • Developers don’t trust and don’t adopt the platform
  • Poor developer experience
  • Customers are suffering bad services
  • Platform is too expensive
  • Platform is too slow to change

Contrary to its name, DevOps has mostly focused on operations, while this workshop highlights both the developer experience and the end customer experience more, as well as connecting the whole platform initiative to the business.

“This is usually a very very helpful discussion,” assured Ciula, but, particularly with regulated environments, she warned, “it tends to be heated because people have different views of what’s more important.”

That’s why you should allow enough time to get everyone focused on the biggest problem, without letting the conversation derail.

Next, you break out the virtual or very sticky Post-Its and clarify as a group:

  • What is a platform? (It’s a single pane of glass.)
  • What is not a platform? (It’s not just Kubernetes.)
  • What does a platform do?
  • What should a platform not do? (It’s not a catch-all that supports all edge cases.)

It’s essential, Kavali Örnek said, that all stakeholders are in the room so you don’t just have an ops view or a dev view — anyone who would benefit from the platform should feel represented.

“This helps us draw the boundaries around what this platform is going to look like and what this platform is going to be responsible for,” she said. “It’s really important for everyone to understand what this platform is and what sort of problem we are trying to solve.”

Anna and Cansu talking to a full room

What Is a Platform as a Product?

“Typically, everybody here knows what a product is, that it is solving a problem or [is a] response to a need,” Ciula said. But it’s important to reiterate that “It has to be feasible. It has to be technically viable. It has to be desirable.”

This relies on everyone understanding the distinction between a features team and an empowered product team — platform engineers will solve the problems but not necessarily implement the solutions.

“Empowered product teams are handed problems to solve and not solutions to implement,” she said.

This enables teams to apply this product lens onto the platform, which has a distinctive business skew. Most often, goals include reducing: time to market, operational burden, and/or developer cognitive load.

“It should be built for developers, with developers, responding to their basic needs,” Ciula said, “and obviously it needs to be technically feasible.”

She pointed to this as an “ah-ha” moment, when “a lot of our customers are saying ‘Shit! No one is really doing this. We don’t really have a platform product manager whose job is that to make sure that we’re doing something for the business’.”

Furthermore, she receives feedback that the platform teams weren’t really thinking about the developers and the services they need, beyond educating them on how to use the platform.

One of the first steps to treating the platform like a product, not just like another top-down Waterfall initiative, is by having a product backlog.

“In that backlog, you don’t just put things about building and maintaining the platform,” Ciula said. Internal marketing is just as important: “You have to think about things like evangelizing your product, making people want it more.”

This not only helps create transparency for developers, but, once they get word of the platform, requests will come pouring in. That backlog helps manage your internal users’ expectations.

“We want developers to come to our platforms. We don’t want to make their lives miserable. This evangelism and understanding what they actually need, what they want,” Kavali Örnek said. Pointing to one of the three pillars of DevOps, she emphasized that “It’s really important to have this conversation and feedback loop with them.”

Next, the workshop applies Team Topologies’ four characteristics of a product to the platform:

  • Optional to use
  • Carefully designed and curated to user needs
  • Simplifies something for users
  • Evolves alongside technology

For each of these points, ask yourself, can this be said of your current platform?

Of course, especially in regulated environments, Kavali Örnek hear nays in terms of optionality. And that may be fair if regulations and security cannot be optional. But, this exercise is really show all stakeholders that “you should build your platform in a way that developers want to use it,” she explained. “It should be so compelling that they don’t want to look for any other alternative. They want to use your platform.”

Otherwise, you’re facing what she calls the Empty Cluster Problem, where a few applications are on top of it, but the majority of your app teams are not on board with your platform.

“This platform is optional. We’ve got to make it so good and we will listen to them and they will come and use it and hopefully evangelize for us internally,” Kavali Örnek said, setting the ideal objective for platform teams.

Creating Platform Metrics that Matter

The third module for this Platform as a Product Workshop comes down to measuring success. Typically, when running this workshop, the Red Hat team finds people are simply measuring velocity, story points, and number of defects, which Ciula dubs a bit disappointing.

While the DORA metrics — deployment frequency, lead time to changes, mean time to recovery, and change failure rate — are considered an industry standard, rarely her customers are applying them. That means a lot of this 15-minute module is an aspiration.

“We want to reiterate to them that platform should do something for the business,” Ciula said. “The platform is not just for technology. It is supposed to deliver something that can have a measurable outcome on the business.”

Beyond DORA, this workshop encourages consideration of platform adoption rate, developer onboarding time, developer net performance score,

“We want to leave them with some food for thought so that once they get serious about the platform and they buy into this approach of Platform as a Product, they do it properly.”

Mobius Loop for Continuous Feedback

While there are other modules available in this workshop, perhaps it’s most important that it ends on the next steps, so participants are ready to take action.

Red Hat has facilitated the creation of an open practice library around an iterative delivery cycle with a tighter feedback loop, as portrayed like a sideways figure eight or infinity symbol. This Mobius Loop pulls from DevOps, agile, design thinking, impact mapping and human-centered design, among others, so teams can choose the relevant practices for each situation:

  • Why are you solving a problem?
  • Who are you solving it for?
  • What are the problems being solved?
  • How can they be solved?

Then, they can design and then implement experiments they believe can solve the outcome. It’s a continuous discovery and delivery journey. In part, Ciula explains, this is to make sure that the platform team doesn’t get stuck on just delivering features without taking time to revisit the discovery phase.

In the end, Kavali Örnek argues, a successful Platform as a Product strategy hinges on organizational psychological safety, which is why the Mobius Loop contains both technical and cultural practices for the sociotechnical practice of platform engineering.

Images by Ana Ciobotaru.

The post How to Host Your Own Platform as a Product Workshop appeared first on The New Stack.

]]>
Take a Platform Engineering Deep Dive at PlatformCon 2023 https://thenewstack.io/take-a-platform-engineering-deep-dive-at-platformcon-2023/ Fri, 26 May 2023 17:00:45 +0000 https://thenewstack.io/?p=22709217

The highly anticipated PlatformCon 2023 is fast approaching, accompanied by a colossal amount of industry hype, and it’s easy to

The post Take a Platform Engineering Deep Dive at PlatformCon 2023 appeared first on The New Stack.

]]>

The highly anticipated PlatformCon 2023 is fast approaching, accompanied by a colossal amount of industry hype, and it’s easy to see why. The two-day virtual conference, which will be held June 8-9, celebrates the more than 15,000-member platform engineering community and features a large lineup of renowned industry speakers. Thousands of platform engineers and practitioners from around the world will participate, welcomed by a packed schedule and the opportunity to dive deep into the latest platform engineering trends, solutions and best practices.

Reasons to Attend PlatformCon 2023

Attendees joining PlatformCon 2023 can expect to level up their platform engineering skills by networking with experts and joining a vibrant community of platform engineers, all dedicated to pushing boundaries. Participants can enjoy regional kickoff events, watch talks at their own pace and engage in speaker Q&A sessions over on the Platform Engineering Slack channel.

The full conference schedule is available here. Attendees will get the chance to:

  • Engage with renown industry speakers like Nicki Watt, CEO/CTO of OpenCredo; Bryan Finster, value stream architect at Defense Unicorns; Stephan Schneider, digital expert associate partner at McKinsey; Charity Majors, CTO at Honeycomb; and Manuel Pais of Team Topologies.
  • Meet professionals from all over the globe who share similar interests.
  • Be inspired by new insights and fresh ideas for platform engineering initiatives.
  • Explore multiple tracks and listen to top field experts tell their stories.

Hundreds of Captivating Talks Spanning Five Tracks

Over the two days, PlatformCon 2023 will feature a diverse range of compelling talks covering five tracks:

Stories: Practitioners will share their enterprise platform-building experiences, covering the journey from inception to implementation and rollout. Examples include Adobe’s Rohan Kapoor discussing the development of an Adobe internal developer platform for over 5,000 developers, addressing challenges, productivity measurement and learnings from a recently launched CI/CD product.

Tech: This track will delve into the technical aspects of developer platforms. Expect talks from speakers such as Susa Tünker from Humanitec who will discuss eliminating configuration drift between environments, and sessions on problem-solving using various tools such as Kubernetes, Infrastructure as Code, service catalogs and GitOps.

Blueprints: Speakers in this track will present proven platform blueprints, including McKinsey’s Mike Gatto and Stephan Schneider who will explore simplifying developer platform design through reference architectures. Attendees can expect other talks to highlight key design considerations and effective integration of developer platform tools.

Culture: Focusing on building developer platforms by engineers for engineers, this track will examine the cultural aspects of platform engineering. Among the topics that will be discussed are product management and the relationship between platform engineering, DevOps, and site reliability engineering. Nicki Watt from OpenCredo will address stumbling blocks hindering the creation of a great platform as a product, and offer counteractive solutions.

Impact: This track will explore the business value and impact of platform engineering initiatives. Analysts like Gartner’s Manjunath Bhat will provide value stories to demonstrate how platform engineering accelerates business outcomes, while other practitioners will discuss strategies for securing executive buy-in.

For anyone looking to level up their platform engineering skills, this is a great opportunity to learn, network with and be inspired by the best in the industry. Register now for new speaker updates, chances to get involved and details about in-person and virtual meetups.

The post Take a Platform Engineering Deep Dive at PlatformCon 2023 appeared first on The New Stack.

]]>
Developer Platforms: Key Findings from a Forrester Snapshot https://thenewstack.io/developer-platforms-key-findings-from-a-forrester-snapshot/ Fri, 19 May 2023 17:52:11 +0000 https://thenewstack.io/?p=22708514

With tough times hitting the tech industry, you might wonder how platform engineering fits into a winning business strategy. According

The post Developer Platforms: Key Findings from a Forrester Snapshot appeared first on The New Stack.

]]>

With tough times hitting the tech industry, you might wonder how platform engineering fits into a winning business strategy.

According to a recent Forrester Opportunity Snapshot commissioned by Humanitec, investing in an internal developer platform (IDP) can be the key to better business performance. IDPs improve developer experience (DevEx) and developer productivity, which in turn helps shorten time to market (TTM), improves software supply chain best practices and drives revenue growth. Here’s what you should know.

Developer Productivity Drives Business Success

When making the case for an IDP to the business, you need to frame your argument around the priorities the business cares about. For example, many organizations recognize that keeping pace with digital transformation is critical to survival.

In this case, an IDP can significantly improve DevEx and productivity to help shorten innovation cycles and drive the success of digital initiatives. In fact, according to the Snapshot, 87% of DevOps leaders are increasing developer productivity as their top priority for the next 12 months. Eighty-five percent say better meeting business demand is their focus, and 85% are shortening release cycle times to help ensure continuous transformation.

But boosting developer productivity is easier said than done. DevOps leaders report a variety of significant obstacles, including pandemic-related issues with onboarding, training and mentoring (41%), and a lack of DevOps practices (25%). Responses also highlighted ways that operations can create challenges for digital transformation, including the insufficient integration of security practices into the development process (23%) and too many manual and/or ticket-based processes (23%).

Platforms Are Key to Improving DevEx

IDPs are becoming a popular solution to poor DevEx, and for good reason. According to the Snapshot, 43% of DevOps leaders agree that an IDP enables true developer self-service. Developer self-service helps alleviate one of the biggest sources of inefficiency caused by increasingly complex cloud architectures: over-reliance on Ops.

But what kind of effect does better DevEx have on your business’s bottom line?

Elevate DevEx and Shorten Time to Market

By removing friction and bottlenecks from the software delivery process, an IDP not only improves DevEx, it enables development teams to work more efficiently to reduce TTM and increase revenue. Seventy-four percent of DevOps leaders say that improving DevEx drives better developer productivity, 77% say that it can shorten time to market, and a whopping 85% report a positive impact on revenue growth. Better DevEx can also help organizations better meet customer demand. Seventy-five percent of respondents say they can better attract and retain customers, and 82% report increased customer satisfaction. Lastly, 81% say that investing in DevEx has a positive impact on developer recruitment and retention.

In short, platform engineering lives up to the hype. Not only can IDPs improve DevEx, but they also have measurable benefits for the businesses that invest in them: Shorter TTM, better customer retention and revenue growth. Start your journey to better business performance and download the full snapshot here.

The post Developer Platforms: Key Findings from a Forrester Snapshot appeared first on The New Stack.

]]>
How Otomi Helped the City of Utrecht Move to Kubernetes https://thenewstack.io/how-otomi-helped-the-city-of-utrecht-move-to-kubernetes/ Mon, 15 May 2023 17:00:01 +0000 https://thenewstack.io/?p=22706920

With digital transformation sweeping across industries, we are seeing more and more organizations adopting cloud native technologies to modernize their

The post How Otomi Helped the City of Utrecht Move to Kubernetes appeared first on The New Stack.

]]>

With digital transformation sweeping across industries, we are seeing more and more organizations adopting cloud native technologies to modernize their IT infrastructure. Kubernetes have become the go-to solution for many when managing containers at scale.

While my experience building Red Kubes as CTO has highlighted the need for these technologies, it has also shed light on how integral the adoption process is for companies and organizations, such as The Municipality of Utrecht in the Netherlands.

Together, we addressed a common issue being complex and siloed applications. For context, Utrecht is one of the largest municipalities in the Netherlands that deals with a myriad of applications and huge volumes of data.

Essentially, its IT infrastructure needed a more modern approach to improving its services for the residents. I’m sure you’ve personally experienced the struggle and frustration of trying to get something from your council, municipality, or city.

The Challenge:

At Red Kubes, we designed Otomi (our open source platform) to address these issues, we personalize each aspect of the platform to meet the needs of the user. Considering the challenge lay in speeding up delivery, building connections between these siloes was of utmost importance.

Otomi logo

Before we stepped in, the process when updating (or even changing) was time-consuming, costly and complex.

Furthermore, there was an increasing need for collaboration and information exchange between municipalities, but the current architecture made it difficult to achieve.

I believe many organizations are facing similar issues in modernizing their infrastructure to support more modern application architectures

To address these challenges, Utrecht, along with 15 other major cities, initiated a review of their current information systems and architecture based on “Common Ground.”

The goal was to establish modern standards for data exchange between municipalities through microservices and an API-driven approach. The new standards could not be supported by the existing infrastructure so there was a need to transition to a modern architecture.

As applications and workloads were to be containerized for better cloud portability, Kubernetes was identified as the ideal solution for container orchestration.

Utrecht recognized that they would need to hire talent or contractors with the necessary skills and expertise to set up and manage a Kubernetes environment.

It’s a good thing the city was aware of the complexity of Kubernetes but especially what comes after installing a Kubernetes cluster.

The Solution:

Utrecht searched for a solution that would make Kubernetes easily manageable and ready for production without requiring extensive staff training or hiring new talent in such a tight market. The proposed solutions revealed that our open-source project Otomi could deliver to requirements.

In a nutshell, Otomi simplifies Kubernetes Engineering and management of all additional components required to run Kubernetes in a secure, compliant, and automated way providing self-service to developers. It is designed to enable organizations to get the most out of their containerized applications in just a few days.

Utrecht successfully adopted Kubernetes technology by leveraging Otomi and creating a platform engineering team to build a production-ready platform on top of the Azure Kubernetes environment.

This allowed developers to concentrate on coding while the platform engineering team focused on security, compliance, scalability and stability (the important stuff in Kubernetes environments!).

By combining AKS (Azure Kubernetes Service) and Otomi, Utrecht was able to set up its Production Ready Kubernetes environment within a few days instead of the many many months it would have taken using traditional methods.

The Results: Technical, Operational and Security

With the implementation of Kubernetes, topped with Otomi, the outcomes for the city included a host of technical, operational and security benefits. From a technical standpoint, the deployment resulted in faster, automated testing, enhanced observability, monitoring and immediate access to root cause analysis (RCA).

Additionally, automatic scaling of the Kubernetes environment was achieved, a process that previously took three to six months before Kubernetes and Otomi. Now, development environments can be deployed within one minute, providing instant self-service for development teams, compared to months in the legacy architecture.

Utrecht explained to us that the benefits of Otomi were also significant from an operational perspective. Applications can now be deployed within one day, compared to the previous process which took months.

Furthermore, the entire journey from application concept to production now averages around four weeks, compared to the prior duration of at least six to nine months.

The platform also achieved stability with 24/7 uptime, automatic restart and recovery, and up to 40% productivity gain for developers through Otomi’s self-service capabilities.

We were able to uplift the security posture as well as the implementation resulted in numerous improvements, including Open Web Application Security Project (OWASP), microsegmentation, live scanning, traceability, cluster and network policy enforcement, and more.

While naturally, I’m biased, the solution worked extremely well. Utrecht’s Senior Manager of Digital Services Lazo Bozarov, shared that the platform has allowed the municipality to accelerate its containerization and cloud journey in which they have modernized their architecture towards microservices and an API-centric infrastructure. Goal achieved.

By integrating Otomi with Kubernetes, containerization is simplified, reducing the need for extensive environment management. This results in organizations accelerating their container platform’s time-to-value and the applications on it. For organizations like Utrecht, implementing Otomi on top of Kubernetes will lead to substantial cost savings, time reduction and risk mitigation.

As someone who has co-engineered this product from the ground up, it’s rewarding to see these real-life adoptions actually making a difference. It’s also exciting to see how Kubernetes can revolutionize IT infrastructure modernization. There’s a bright future ahead for the world of Kubernetes, especially in organizations such as these.

The post How Otomi Helped the City of Utrecht Move to Kubernetes appeared first on The New Stack.

]]>
IBM Cloud CTO: Pros Outweigh the Cons with Platform Engineering https://thenewstack.io/ibm-cloud-cto-pros-outweigh-the-cons-with-platform-engineering/ Fri, 05 May 2023 13:00:12 +0000 https://thenewstack.io/?p=22707108

One of the big discussions at KubeCon+CloudNativeCon EU 2023 was platform engineering and its value-add to developers. But what about

The post IBM Cloud CTO: Pros Outweigh the Cons with Platform Engineering appeared first on The New Stack.

]]>

One of the big discussions at KubeCon+CloudNativeCon EU 2023 was platform engineering and its value-add to developers. But what about the downside for developers when it comes to platform engineering?

The New Stack asked IBM Fellow and Cloud CTO Jason McGee about the potential downsides of platform engineering, including tool lock-in and less ability to explore other options. He agreed these are potential cons for developers.

“Developers are an opinionated bunch,” he said. “The trade-off in any kind of platform engineering model is the platform has an opinion about tools, an opinion about technologies. You’re trying to limit some choices in the spirit of more consistency and faster provisioning.”

On one hand, developers like to have choice, he said but on the flip side are challenges like security and compliance, which have become burdens for developers, McGee added. By adopting platform engineering, developers free themselves of that burden, he said.

Platform engineering gives developers a menu of approved tool and library options, which allows organizations to onboard developers more quickly, according to a KubeCon EU panel discussion held for the media. Developers naturally like to play with new tools — it’s part of their job, pointed out David DeSanto, chief product officer at GitLab. The problem is that can lead to tool sprawl as each developer introduces new tools and integrations, he said. That becomes a major headache in a market with a 20-23% attrition rate, the panel added, since developers may come and go without documenting what tools or integrations they used.

Platform engineering allows developers to focus on business logic and the core problems they’re trying to solve, McGee said. That’s why most developers he’s spoken with support platform engineering, he added.

“The developers who’ve done it more than once usually are pretty big advocates of platforms,” he said.

IBM Cloud’s Role

For its part, IBM views its cloud as a platform for engineering and has built capabilities around those decisions of compliance, auditing and managing costs, McGee said.

“We have, for example, a capability called Security Compliance Center, which does posture management and compliance scanning so companies can deploy workloads and be assured that the cloud is configured correctly, that it’s secure, that it follows all their policies,” he said. “We’ve been doing a lot of work on the other end of the DevOps, helping them build compliance and deployment pipelines to do code risk analysis, that do provenance on code or build software bill of materials for supply chain protections.”

IBM also has its core platform, as well as Kubernetes, OpenShift and databases, amid other things that it’s building on the platform such as reference architecture models. Additionally, IBM provides TerraForm automation for a solution that enables customers to build private catalogs and customize them, he said. A platform engineering team can take IBM’S reference architectures, customize them with additional requirements, and then surface the requirements to their application teams in a private catalog of the cloud, where they can control which services their end users see, McGee explained.

“We’re doing a lot of the work to enable them to define their platforms without having to do it all themselves,” he said. “One of the risks of platform engineering is you start with VMs [virtual machines] or you start with Kubernetes and then everything else you figure out how to deploy and run yourself — observability tools and security tools and everything else. That’s a huge burden for a lot of these companies.”

The cloud can be part of the solution for creating a platform engineering model, McGee noted that cloud spend remains in the single digits over overall IT spending. Still, even on-premise companies want a cloud-operating and cloud-consumption model even when it’s running in their own data center, he added.

“There’s still huge on-prem IT states,” he said. “A lot of the work we’re doing in financial services, for example, is really about enabling those companies to move more of their mid- and back-office, mission-critical workloads to the cloud, which have been blocked by security and compliance controls, limitations and regulation and availability and resiliency.”

CNCF paid for travel and accommodations for The New Stack to attend the KubeCon+CloudNativeConEurope 2023 conference.

The post IBM Cloud CTO: Pros Outweigh the Cons with Platform Engineering appeared first on The New Stack.

]]>
Infrastructure as Code or Cloud Platforms — You Decide! https://thenewstack.io/infrastructure-as-code-or-cloud-platforms-you-decide/ Tue, 02 May 2023 16:34:07 +0000 https://thenewstack.io/?p=22706783

Let’s compare two prevalent approaches to cloud infrastructure management. First is what we broadly classify as Infrastructure as Code (IaC),

The post Infrastructure as Code or Cloud Platforms — You Decide! appeared first on The New Stack.

]]>

Let’s compare two prevalent approaches to cloud infrastructure management. First is what we broadly classify as Infrastructure as Code (IaC), where engineers use programming\scripting languages to build a set of scripts to achieve the desired topology on a cloud platform. Terraform, Cloud Formation, Chef, Puppet and Ansible are some popular ones.

This technology consists of a language to write scripts, plus a controller that can run the scripts. Once satisfied with the result, the user would save the scripts in a code repository. Subsequently, if a change is to be made, then the files would be edited and the same process repeated.

The second category would be a cloud orchestrator or platform. This would typically be a thin abstraction over native cloud APIs that would interface with the user as a web service, and the user would connect to the service (via UI or API) and build the cloud topology within that web service itself.

The topology built will be applied by the orchestrator and saved in its own database. The user does not need to explicitly save the configuration. When an update has to be made, the user will again log in to the system and make changes.

For smaller-scale use cases, a platform may be too heavy. But at scale, the IaC approach tends to morph into an in-house platform. A better strategy, in this case, is to use an off-the-shelf platform that can be enhanced with IaC scripts when customization is required. Megascale data centers like those belonging to Facebook and Netflix are a different ballgame and are not considered in this context.

‘Long-Running Context’

The fundamental value that a platform-based approach provides is what we call “long-running context.” People may also call this a “project” or a “tenant.” A context could map to, say, an application or an environment like demo, test, prod or a developer sandbox. When making updates to the topology, the user always operates in this context. The platform would save the updates in its own database within this context before applying the same to the cloud. In short: You are always guaranteed that what is present in this database is what is applied to the cloud.

In the IaC approach, such a context is not provided natively and is left to the user. Typically this would translate to something like “Which scripts need to be run for which context?” or maybe a folder in the code base that represents a configuration for a given tenant or project. Defining the context as a collection of code is harder because many of the scripts might be common across tenants. So most likely it comes down to the developers’ understanding of the code base.

A platform is a more declarative approach to the problem, as it requires little or no coding, as the system would generate the code based on the intent, without requiring knowledge of low-level implementation details. Meanwhile, in the case of IaC, any changes require a good understanding of the code base, especially when operating at scale. In the platform approach, a user can come back and log in to the same context a few days later and continue where they left off without having to dig deep into the code to understand what was done before.

Difference Between the Code Base and What Is Applied to the Cloud

The second fundamental difference between the two is that IaC is a multistep process (write the script, run it and merge it in the repo), while a platform is a one-step process (log in to the context and make the change). With IaC, it is possible that the user might update a script, but may also forget or postpone saving it in the repository. Meanwhile, another engineer could have made changes to the code base for their own side of topology and merged it. Now, since many pieces of code are shared for the two use cases, the first developer might find themselves in a conflict which, even if resolved by merging the code, lands them in a situation where what was run in the cloud is not what is in the repo. Now the developer has to re-run the merged code to validate, notwithstanding the possibility of causing regression. To avoid this risk, we need to now test the script in a QA environment.

All the ‘Other’ Stuff

IaC tools will enable deployments, but there is so much more to running infrastructure for cloud software. We need an application-provisioning mechanism, a way to collect and segregate logs and metrics per application, monitor health and raise alerts, create an audit trail, and an authentication system to manage user access to the infrastructure. Several tools are available to solve these individual problems, but they need to be stitched together and integrated into an application context. Kubernetes, Splunk, CloudWatch, Signalfx, Sentry, Elk and Oauth providers are all examples of these tools. But the developer needs a coherent “platform” to bring all this together if they want to operate at a reasonable scale. This brings us to our next point.

Much of IaC Is Basically a Homegrown Cloud Platform

When talking to many engineers we hear the argument that Infrastructure as Code combined with BASH scripts of even regular programming languages like Go, Java and Python provide all the hooks necessary to overcome the above challenges. Of course, I agree. With this sort of code, you can build anything. However, you might be building the same kind of platform that already exists. Why not start from an existing platform and add customization through scripts?

The second argument I have heard is that Infrastructure as Code is more flexible and allows for deep customization, while in a platform, you might have to wait for the vendor to provide the same support. I think as we are progressing in technology to the point where cars are driving themselves — once thought to be little more than pure fantasy! — platforms are far more advanced than they are given credit for and have great machine-generation techniques to satisfy most, if not all, use cases. Plus, a good platform would not block a user from customizing the part that is beyond its own scope via scripting tools. A well-designed platform should provide the right hooks to consume scripts written outside the platform itself. Hence this argument does not justify building a code base for the majority of the tasks that are standard.

‘There Is No Platform That Fits Our Needs’

This is also a common argument. And I agree: A good platform should strive to solve this prevalent problem. At DuploCloud, we believe we have built a platform that addresses the majority of the use cases while giving developers the ability to integrate policies created and managed outside the system.

‘The San Mateo Line!’

A somewhat surprising argument in favor of building homegrown platforms is that it is simply a very cool project for an engineer to tackle — especially if those engineers are from a systems background. I live in Silicon Valley and have found a very interesting trend while talking to customers specifically in this area.

When we talk to infrastructure engineers, we find that they have a stronger urge to build platforms in-house, and they are quite clear that they are building a “platform” for their respective organizations and are not, as they would consider it, “scripting.” For such companies, customization is the common argument against off-the-shelf tools, while hybrid cloud and on-premises are important use cases. Open source components like Kubernetes, Consul, etc., are common, and thus I frequently hear the assertion that the wheel need not be reinvented. Yet the size of the team and time allocated for the solution is substantial. In some cases, the focus on building the platform overshadows the core business product that the company is supposed to sell. While not entirely scientific, I tend to see these companies south of San Mateo.

Meanwhile, the engineering talent at companies north of San Mateo building purely software as service applications is full stack. The applications use so much native cloud software — S3, Dynamo, Amazon Simple Queue Service (SQS), Amazon Simple Notification Service (SNS) — that it’s hard to be hybrid. They are happy to give the container to Amazon Elastic Container Service (Amazon ECS)  via API or UI to deploy it. They find no joy in either deploying or learning about Kubernetes. Hence, the trend and depth of in-house customizations is much less.

How many times and how many people will write the same code to achieve the same use? Time to market will eventually prevail.

The post Infrastructure as Code or Cloud Platforms — You Decide! appeared first on The New Stack.

]]>
KubeCon Panel: How Platform Engineering Benefits Developers https://thenewstack.io/kubecon-panel-how-platform-engineering-benefits-developers/ Fri, 28 Apr 2023 15:51:40 +0000 https://thenewstack.io/?p=22706684

If DevOps was about combining the workflows of operations and development, then platform engineering aims to be the solution to

The post KubeCon Panel: How Platform Engineering Benefits Developers appeared first on The New Stack.

]]>

If DevOps was about combining the workflows of operations and development, then platform engineering aims to be the solution to the problems that are created. It’s not an either/or proposition between the two, but platform engineering is an evolution of the DevOps movement, said a KubeCon+CloudNativeCon EU panel organized for the media.

“I sometimes struggle with why something has to go and die for something else to exist,” said David DeSanto, chief product officer at GitLab. “To me, there’s a lot more overlap and value with the Venn diagrams that, in essence, platform engineering is making sure your DevOps teams are more effective because they’re working with the platform, and vice versa.”

What’s a ‘Platform,’ Anyway?

In the non-technical sense, a platform is something that supports whatever you’re trying to do, said Sarah Polan, HashiCorp field CTO EMEA.

“So in the case of IT, that’s the pieces that you want to build upon to create your business value and the business problem that you’re trying to solve,” Polan said. “In our case, those are all the little pieces that it takes to deploy and give Terraform as a product, and that’s what our team is reliant on, to consume in order to be able to do that.”

The product can literally be the platform, suggested DeSanto.

“At least for us at GitLab, the platform is the product. Our platform teams are focused on gitlab.com, which is the product,” DeSanto said. “So I think it depends on who you’re talking about — [who] is the customer on the conversation?”

The panel also agreed that platform engineering is a practice, not something to buy — although there are those who will try to sell it to you, they cautioned.

“Platform engineering can be a role that we’re doing, similar to DevOps, and my belief — and talking to my peers — is it really is an outgrowth of it, an extension of what we’re doing. It is evolutionary, not necessarily revolutionary,” said Stu Miniman, director of marketing insights for hybrid platforms at Red Hat.

Why: The Point of Platform Engineering

Platform engineering is about applying best practices and ensuring some control over the development organization and the tools they use, Polan said.

“I was working for a financial institution, at the time running a Go application, so it wasn’t impacted at all by Log4j, but I still got three phone calls in the middle of the night, because … we needed to make sure that everything was properly remediated,” Polan said. “[It’s] making sure that we as organizations have proper control, but also are garnering business value based off this and we’re setting the guardrails in place to say, ‘We want you to be able to do what you want. We want you to follow industry standards, use open source, but we also want to make sure that we have adequate control over these things.’”

By having a platform of approved tool and library options, organizations can onboard developers quicker, the panel noted. Developers naturally like to play with new tools — it’s part of their job, said DeSanto, who previously worked as a developer. The problem is that can lead to tool sprawl as each developer introduces new tools — and integrations to them, he said.

“What you end up hurting is security and efficiency,” he said. “You’re essentially asking people whose specialty is not to manage infrastructure — and all inside the CI/CD pipelines as part of that — to make the decisions for themselves, but that is not going to give them the best outcome.“

That becomes a major headache in a market with a 20-23% attrition rate, Polan said.

“It doesn’t take long for someone to get frustrated and realize they can walk across the street to get another job,” DeSanto added. “So if you’re building something that’s truly working for them [developers], they’re going to stay more engaged. You’re going to have your company get better outcomes. You want happier employees, happier employees also equal more revenue.”

Why Now for Platform Engineering

Platform engineering is a movement that may seem new, but has actually been growing for three years now, Miniman said. Two factors that have changed over the past three years are an increase remote work and now even ‘mom and pop shops’ are becoming technology companies to stay competitive, the panel said. It’s no longer enough to build a monolith application and leave it in place for 20 years.

“Fifteen, 20 years ago, what we were actually building was very different and the pace of change responsiveness,” Miniman said. “I can’t just build a bespoke application and leave it there for a decade anymore because then I’ll be a competitive disadvantage.”

Why Developers Should Embrace Platform Engineering

“Evolution-ability” is the phrase Polan used to describe why companies should embrace platform engineering. Virtualization, 5G, and service-oriented architecture are creating a development environment that requires flexibility and modularity, Polan said.

“That also is going to be reflected within the platform teams… By giving that to a platform team that allows us to go ahead and scale some of these solutions, but then also have the flexibility to move things forward,” she said. “It just it gives me a better option to be more responsive and quicker, which obviously gives me a better return on investment and better security.”

But perhaps the biggest reason developers should embrace platform engineering is that it reduces the cognitive overload of programming these days, Miniman said.

“If we can make things a little bit simpler and allow developers to focus on their primary role and stuff they want to be doing, that that makes it easier,” he said.

CNCF paid for travel and accommodations for The New Stack to attend the KubeCon+CloudNativeConEurope 2023 conference.

The post KubeCon Panel: How Platform Engineering Benefits Developers appeared first on The New Stack.

]]>
KubeCon Panel Offers Cloud Cost Cutting Advice https://thenewstack.io/kubecon-panel-offers-cloud-cost-cutting-advice/ Thu, 27 Apr 2023 15:00:22 +0000 https://thenewstack.io/?p=22706272

Back in the days of on-premise compute, reducing costs meant cutting capital expenditures. But with the cloud’s pay-as-you-go model, how

The post KubeCon Panel Offers Cloud Cost Cutting Advice appeared first on The New Stack.

]]>

Back in the days of on-premise compute, reducing costs meant cutting capital expenditures. But with the cloud’s pay-as-you-go model, how can companies realize efficiencies in light of the current economic climate?

“It’s really becoming an … operational expense and impacting companies greatly,” said Aparna Subramanian, director of product engineering infrastructure at Shopify, during a Friday session at KubeCon+CloudNativeConEurope 2023 conference in Amsterdam. “That’s the reason why we have this increased focus on optimizing and doing more with less is the mantra these days.”

Subramanian joined Phillip Wittrock, an Apple software engineer; Nagu Chinnakaveti Thulasiraman, engineering manager in the car infrastructure department at Zalando SE; and Todd Ekenstam, principal software engineer at Intuit for Friday session on, “Cloud Computing’s First Economic Recession? Let’s Talk Platform Efficiency.” The panel looked at three broad categories of reducing costs: Culture, operations and design.

Culture: Measure at App and Service Level to Find Costs

When it comes to reducing costs, the first step is creating a culture of measurement, said Wittrock.

“One thing I think it’s helpful to start with is start out measuring where your big wins are, where do you want to focus? What’s going to move the needle a lot, what’s going to take a long time to do, what’s maybe not going to move it as much but is very easy to get done?” Wittrock said. “Then from there, figure out who the right folks to engage with are, what are the right teams, so you can start looking forward.”

It can also be hard to figure out whose problem it is to increase efficiencies and cut costs, added Subramanian. That’s why it should be a cross-team effort with a financial practice or center of excellence component to it, she said.

“Often we run into the situation where it’s everybody’s problem, but it’s nobody’s problem,” she said. “Having the central team is really important but it’s also important to understand that it now suddenly doesn’t become only the central teams responsibility for making sure the platform is efficient. It has to be a collaboration between engineering, finance, procurement — the team that is negotiating contracts with your cloud vendor or other vendors.”

Ekenstam asked the packed audience for a show of hands to determine who knows what their cloud bill is. He then asked for a show of hands from those who know how much their individual services or applications costs. Not surprisingly, the number was smaller although not insubstantial.

“That’s, to me, the first step you need to know — what you’re spending,” Ekenstam said. “That’s the big challenge, taking that cloud costs, that big bill, and actually breaking it into individual teams, individual applications, because only then when you have that visibility will you know where you have the opportunities to improve.”

Intuit runs a developer portal where it tracks all of its different software assets, whether it’s a service or application, he said. Each has an asset ID that is propagated and tagged to all the resources required to support that service or application, he said. Then, IT aggregates all the billing data attributed based on the service or application, and it provides a number for the development teams. Those numbers also are rolled up and provided to various directors and vice presidents.

“It’s not enough to give a top-level CTO or the CEO the bill — you need to get that visibility to people who can actually make decisions and make changes to how the system operates,” Ekenstam said.

“That level of visibility is really the first starting point when we started looking into things more closely at Shopify,” Subramanian added. “We were able to see clearly from the cloud bill what are the different projects, what are the different clusters, but it’s not exactly helpful, right? Because if you have a multitenant platform, you want to know how much is App A costing and how much is App B costing.”

Identifying application cost can enable the platform team to go to the team or leader and hold them responsible for making the changes necessary to improve the efficiency, she added.

Don’t Automatically Cut Where CPU Is Idle

It may seem like the best plan of action would be to cut wherever there are idle resources, but that’s actually not a great idea because it could interrupt a workload that’s trying to complete, warned Wittrock.

“The idle resources may be an artifact of what are the capabilities of the platform you’re running on? What does it offer and maybe that slack just needs to be there for your availability,” he said.

That’s why it’s important to view the efficiency and waste for each application across a variety of stakeholders.

“Shopify is an e-commerce platform and sometimes we have to reserve and scale up all the way because there’s a big flash sale coming up and that time, you don’t want to be scaled all the way down and you don’t want your Cluster Autoscaler to be kicking in and doing all of these things,” Subramanian said. “There are times when you want to protect your reputation, and it’s not about efficiency.”

That’s where a central finance team can come into play, ensuring that the platform returns to normal load after big peak events like Christmas for Shopify, she added.

“That’s why you need that central finance team because there’s somebody looking at this every day and reaching out to the appropriate teams to take action,” she said.

Operations: Focus on Business Need

Three Pillars of Platform Efficiency

Photo by Loraine Lawson

Intuit has a number of different patterns to its workload. TurboTax is busiest during tax season, for instance, while QuickBooks is very busy during the traditional 9-to-5 work day, Ekenstam said.

“CPU, memory and compute resources is a big component of cost,” he said. “You need to really see how can you make your clusters and applications run most efficiently to minimize costs, but at the same time, provide the services that you need to.”

Shopify actually prepares for Black Friday and Cyber Monday by disabling autoscaling and scaling all the way up to projected traffic because then the goal is to protect Shopify’s reputation on those high volume days, said Subramanian.

“But at other times, we do leverage autoscaling,” she added. “We use VPA [Vertical Pod Autoscaler] to recommend what the right memory and CPU should be and we make that suggestion to the respective application team using a Slack channel.”

The application team knows the specific nature of their workload, so it’s up to them to review the recommendation and make the appropriate changes, she added.

Autoscaling is a key capability for reducing cloud costs, Ekenstam said, but it isn’t a panacea.

“If we can autoscale, not only your application, but also your cluster up and down, that’s for cost,” he said. “That’s obviously the best, but it does come with some disruption. So how can you minimize that disruption? I think a lot of it starts with making sure the apps can be disruptive.”

Design: Kill Kubernetes Pods to Best Utilize Resources

You can’t launch a pod in Kubernetes and expect that pod to live forever, Ekenstam said. At Intuit, they rotate their clusters every 25 days — a de-scheduler automatically de-schedules pods and reschedules them on another node to ensure it takes full advantage of the node resources, as well as so Intuit can update security patches and Amazon machine image (AMI) on the nodes, he explained.

“It also has a side effect of forcing all those applications to get rescheduled and trains our developers that, ‘Hey, I can’t count on these pods running forever. It’s okay that they terminate. It’s okay that they come back up,’” said Ekenstam. “By doing that, we’ve helped build this culture of understanding how Kubernetes works for the developers.”

Intuit is investigating developing a system that takes the recommendations from the vertical pod autoscaling, the historical metrics from each application, and then using that to make decisions and recommendations for both the VPA and the horizontal pod autoscaling (HPA). The system would integrate those recommendations and then apply them the pipeline using GitOps, he explained.

“If you change the resources of a pod, the change in resource will start back in your pre-pod environment, get tested, validate that it does work in pre-prod and work its way through the pipeline to your production environment,” Ekenstam said. “We don’t want to just suddenly change the resources in production without being able to test it first.”

Profiling Apps for Efficiency

Another step to reducing cloud spend is to ensure applications are cloud native and can run on Kubernetes, Ekenstam said. But he asked the panel what can be done beyond that.

It takes a partnership between the platform or infrastructure team and the applications team, said Subramanian.

“Something that Shopify has been working on recently is continuous profiling of applications because you don’t want to just tell application developers … make sure it’s efficient and optimal at all times,” said “In order to reduce the friction, we have rolled up this continuous profiling feature that every application is getting profiled continuously at a certain sample rate.”

That’s made it easy for developers to look at their app profile and make decisions about CPU usage, processes running, and so on, she added.

“Being able to create such tools and enable the application developers to make the right decision is also a key part of efficiency and optimization,” Subramanian noted.

At Intuit, whenever there is a new release of their platform, they run it through Failure Modes and Effects Analysis (FMEA) testing, which includes a load test, Ekenstam said.

“Then we measure how many nodes did it take to do that workload and that helps us identify some kind of performance regression, and performance regressions are also quite often cost regressions, because if you’re suddenly needing to use more nodes to do the same workload, it costs you more, so that’s another technique that we’ve used to identify and to compare different releases,” he said.

CNCF paid for travel and accommodations for The New Stack to attend the KubeCon+CloudNativeConEurope 2023 conference.

The post KubeCon Panel Offers Cloud Cost Cutting Advice appeared first on The New Stack.

]]>
Can the Internal Developer Portal Solve Alert Chaos? https://thenewstack.io/can-the-internal-developer-portal-solve-alert-chaos/ Fri, 21 Apr 2023 18:52:53 +0000 https://thenewstack.io/?p=22705838

Platform engineering is gaining major traction, and 2023 is looking like another record year for the topic. Organizations are re-examining

The post Can the Internal Developer Portal Solve Alert Chaos? appeared first on The New Stack.

]]>

Platform engineering is gaining major traction, and 2023 is looking like another record year for the topic. Organizations are re-examining the developer experience and how platform engineering can make it better. The focus is to reduce cognitive load resulting from a shift-left approach and the increasing complexity of modern application development, achieving more developer productivity and self-sufficiency.

Alerts are numerous and spread across many tools, without much context, driving both cognitive load, or worse, alert fatigue. Can alerts be dealt with using the same platform engineering tools, namely the internal developer portal? Or should alert fatigue and too many alerting tools be left outside the domain of platform engineering?

An often-overlooked superpower that internal developer portals have is the ability to unify tools and practices that may not be tied to application development but to a more general “taking care of business.” This can be DevSecOps, bringing vulnerabilities and incidents into a package catalog, FinOps and more. This happens by virtue of the software catalog acting like a huge metadata store of anything related to developers and DevOps, providing the right information in context.

Last but not least, internal developer portals can play a role in unifying alerts from many tools into one central console, improving the developer experience and ensuring alerts are treated as they should be, in context.

The Software Catalog Connects Everything

According to Gartner, internal developer portals are the most mature tool in the emerging platform engineering space. In the portal, developers can access developer self-service actions, such as creating cloud resources, scaffolding microservices, performing Day-2 actions, setting up ephemeral environments and more. They can also use the software catalog to view abstractions of DevOps and software development life-cycle (SDLC) assets.

The core of the internal developer portal is the software catalog. Software catalogs come in all shapes and sizes. Some are a CI/CD software catalog. Some are driven by cloud resource data and others by GitOps data. The flexibility of the software catalog can allow the definition of any catalog entity whose data would shed light on how to work better, such as a running service in production. It all depends on the use case. Check out these internal developer portal templates to see the types of catalogs that can be created.

There’s a main theme here: The software catalog is a metadata store that contains everything a developer needs, and then shows developers abstractions of the data. It reduces cognitive load through redaction and whitelisting, while still allowing search on all metadata, all within role-based access control (RBAC) policies. We’ve shown how this works in a Kubernetes software catalog, essentially providing developers with a central point from which to access data about all things related to the software development life cycle.

What would be the use of unified alert data in the software catalog?

A Single Pane of Glass for All Alerts

DevOps tools, cloud resources and monitoring tools fire endless alerts, whether from Prometheus, Grafana, Datadog, Sentry, AWS, Coralogix, Splunk and others.

There are benefits to bringing alerts into a graph software catalog. When an alert for a specific resource occurs, you can immediately understand the ripple effect it has on other software catalog entities. For example, if you have an alert on a cloud resource (max memory), you can easily understand the affected microservices that are using this cloud resource and identify risks accordingly.

Many platform engineering leaders ask me whether they can add those alerts into the internal developer portal. It’s not that they are looking for yet another alert console. What they want to achieve is in-context alerts for developers in one central place without traversing many DevOps tools. It means that troubleshooting is no longer alert-oriented but oriented to the actual service or cloud resource that is affected. This is what the software catalog is all about.

The internal developer portal provides developers with a layer of visibility into the SDLC of their applications. Alerts affect the business and thus need to be looked at from the “resource” point of view and enrich the data given to developers about services and resources and not the other way around.

An alert for a cloud resource will also show the related service, its environment and the service or resource owner. This means that developers don’t need to check many alerting tools, and more importantly, developer cognitive load is reduced since there is no need to hunt for the affected services, entities, etc.

Let’s see how this is implemented in practice:

In the internal developer portal, we set up a blueprint for alerts. A Blueprint is the generic building block in Port. It represents assets that can be managed in Port, such as microservice, environments, packages, clusters, databases and many more. Blueprints can be used to represent any software catalog asset. Below you can see the properties defined for the alerts blueprints.

In this case, we chose the following properties for alerts:

  • Category (the alert type) — either infrastructure, security, system health, etc.
  • Severity level — error, warning and info
  • Status — closed, acknowledged and open
  • Total number of alert occurrences
  • Source — a link to the relevant system with all additional information about the alert. The source can be any relevant system such as Snyk, Pager Duty, Sentry and more, all presented under one blueprint.

In this case, we created a relation between the “Alerts” blueprint and the “Running Service” blueprint so that each alert is linked to their relevant service and environment, providing context and an inkling of the blast radius of the alert.

We also added “Mirror Properties” (properties that are based on relations) to display additional information in the “Alert” blueprint that will give us more context to better understand the alert:

  • The service and environment in which the alert occurs
  • Environment type, for example EKS, GKE
  • The service on call from PagerDuty (the “incident owner”)
  • The service code owners from GitHub
  • Investigate — links to the relevant monitoring dashboard for the specific service, such as Grafana

This generates alert entities in the software catalog:

We can now drill down into each of them:

In the blueprint, we also defined developer self-service actions that can be performed on alerts, from acknowledging to rolling back and restarting services.

Using Scorecards on Top of Alert Data Is a Force Multiplier for Workflow Automation

Internal developer portals let you define scorecards on top of any software catalog entity. Scorecards let you set and track standards and key performance indicators on anything from production readiness, DORA metrics, Kubernetes standards and more. Scorecards help developers with “you build it, you own it” because they measure and visualize what’s important.

Scorecards provide developers with a bird’s-eye view of a resource they own as an engineer (services, environments, etc.). Traditional scorecards can check whether an owner is assigned or a minimum version for a cluster. Scorecards work well for alerts since they “grade” alerts on the different entities in the software catalog. Adding this data provides a deeper understanding of the health/quality of a resource.

In turn, scorecard data associated with a specific software catalog entity can be part of a CI pipeline, for instance, not allowing certain create actions when alerts scorecards are poor.

The post Can the Internal Developer Portal Solve Alert Chaos? appeared first on The New Stack.

]]>
Rise of FinOps: CAST AI and Port Illuminate Your Cloud Spend https://thenewstack.io/rise-of-finops-cast-ai-and-port-illuminate-your-cloud-spend/ Thu, 20 Apr 2023 09:00:58 +0000 https://thenewstack.io/?p=22705801

AMSTERDAM — As your company grows, so does your cloud spend, and so does your carbon footprint. In the tech

The post Rise of FinOps: CAST AI and Port Illuminate Your Cloud Spend appeared first on The New Stack.

]]>

AMSTERDAM — As your company grows, so does your cloud spend, and so does your carbon footprint. In the tech industry, we just kind of take this as a given. Simply too much power condensed among the big three cloud computing providers — Amazon Web Services (AWS), Microsoft Azure, and Google Cloud Platform (GCP).

FinOps is the technical, financial and business discipline that looks to change that by creating transparency across silos and shifting financial accountability left to the developers. With it, a whole slue of tools are cropping up to help facilitate your FinOps. This is especially true at the Cloud Native Computing Foundation‘s KubeCon+CloudNativeCon Europe, because Kubernetes clusters often have the highest cloud cost, with the least amount of insight into them.

Just this week, Microsoft announced an integration with the open source Kubernetes cost management project OpenCost, which is dedicated to cost monitoring cloud native environments.

The rise in FinOps for Kubernetes tooling will only continue. For Kubernetes workloads, FinOps is about rightsizing resources within your cluster, so scaling up and down does come with waste — wasted resources and excessive carbon output. And, with startups CAST AI and Port, it’s about taking a lot of this decision-making away from the developers, automating and optimizing that cloud spend, and reducing that cognitive load.

CAST AI Cloud Automation AI Platform

When Laurent Gil and his cofounders launched their previous cybersecurity SaaS company Zenedge, they learned this lessen the hard way. The team’s monthly AWS cloud bill went from $1,000 in 2015 to almost $1.2 million in 2018. They experienced a 10% increase just about every month.

“Just describing the life of a SaaS product. We were using our cloud to deliver our product, so the more customers we had, the more costs we had. That’s normal,” Gil told The New Stack. “The frustration was not being able to understand what to do about it.”

His team would receive 100-page AWS bills explaining what they were spending, but “It doesn’t tell you, is that the right amount? Are we spending the right thing? Are we overspending? Underspending?” They knew what they were spending, but he continued, they didn’t know what to do about it.

So, after Zenedge was acquired by Oracle and they worked there for a while, Gil, Yuri Frayman and Leon Kuperman decided to found CAST AI in 2020 — because they couldn’t be the only ones with that issue. “We did not want to be another cost reporting tool. We really, really wanted to build an engine that will automatically look inside your cloud account, and reduce automatically, and rightsize automatically your cloud cost,” he clarified.

Now, in about a minute, Gil says, CAST AI is able to analyze all your Kubernetes clusters across the big three cloud providers and say:

  • This is your cost.
  • This is what your cost would really be.
  • Push a button to activate.

Then, the CAST AI engine goes inside and real-time automatically optimizes within the clusters. Not just once, but every few seconds, cutting on average 40% of CPU usage and cloud cost.

“You have to do this every few seconds because your traffic is never linear,” Gil explained, like when your main base is sleeping. He says the price drop is almost instant.

Besides this automated rightsizing, CAST AI also features pricing arbitrage.

“When a developer deploys an app, they have to say, how many? What kind of machine do I want? There are roughly 600 different times of machines on AWS. So the developer is asking: Which one do I take to deploy my application?” Gil argues that 99% of the time, they take the machine they already know, like the Amazon EC2 M6g instances. “That’s the only reason and there are 600 different types of machine. Maybe there’s another one that has the same amount of compute that is actually cheaper.”

The CAST AI engine takes over that decision-making as well.

“It takes over entirely the management of your cloud accounts, cloud infrastructure for all your applications, and it has been trained to decide which VM, which machine, is the most cost-effective for the workload right now. And then think of this as ‘rinse-repeat every few seconds. Because your application is growing. You need to add more machines. Which ones do you turn off? Which ones do you add?” Gil explained. It will also automatically move some of the on-demand workloads to spot instances. About a third of customers are multicloud, but he predicts he will see an increase in that soon enough.

In the about 14 months since launch, the few hundred CAST AI customers have saved an average between 50% and 75% of their cloud cost.

When asked if they are hated by the Big Three for taking away revenue, he responded that actually their biggest source of customers is referrals by AWS. Their mostly SaaS provider base is then reinvesting that money saved into moving more workloads onto the cloud, accelerating their app modernization.

Port Internal Developer Portal Embraces FinOps Transparency

The next step in embracing FinOps is the accountability side of things. How do you understand what team or service is using what cloud services? When possible, this takes examining the 10-plus step process that a developer needs to create and deploy a microservice. Which, as Gil pointed out, involves making decisions for which is the best machine to put it on and other cloud native knowledge that app developers really shouldn’t need to worry about.

The cloud native cognitive load for developers is ever-increasing. In part, because of Kubernetes’ endless learning curve. That’s an added risk why companies try to go FinOps without careful consideration and planning.

“Most cost reporting for Kubernetes is around Kubernetes. So you can see numbers with regards to deployment, service, namespace, clusters, etcetera. The problem is that it doesn’t provide the context you really need, which is to see costs by service, team, customer or environment. Providing cost reports in the context of the business is what’s needed, and — just like everything else with Kubernetes — it’s about an abstraction layer that helps developers do their job without digging around in DevOps tools or reports,” Zohar Einy, CEO of Port, told The New Stack.

Port is an internal developer portal (IDP) that this week is announcing its use cases extended to FinOps. For Einy, since everyone in the company is directly or indirectly dependent on the software being developed, it becomes necessary for everyone to understand their expenses, connecting back to the business.

The challenge is, as previously mentioned, cloud provider reports, while extensive, are kind of meaningless to all three pillars of FinOps — finance, engineering and business. To effectively measure cloud spend, it needs to drill down to the team or microservice level.

The necessary data is there, assures Einy, but it’s just scattered across your existing cloud reporting tools. Port now integrates with those tools like Kubecost, to offer cloud reporting via a common, visual language across these silos.

“We see DevOps teams as a major beneficiary of IDPs, and FinOps specifically, since, instead of organizing FinOps reports, or trying to manage all of their DevOps asset information, they can use the software catalog in the internal developer portal,” Einy said. FinOps is a logical extension of the common purpose shared with platform engineering — reducing developer cognitive load while increase control and transparency.

Is FinOps the Fastest Path to GreenOps?

We already know that cloud cost is the closest proxy we have for environmental impact of your software development lifecycle. After all, data centers are the driving force behind the tech industry having the fastest-growing carbon footprint of all. And there are no signs that growth will slow any time soon.

Using a no-code developer portal like Port enables companies to represent their regions — there are certainly more or less environmentally friendly ones — and providers, while tying them to expense. FinOps becomes a natural and important driver of GreenOps.

CAST AI is commissioning a study in attempt to measure their positive environmental impact. After all, “We eliminate things that are switched on, but that we don’t use.” If you think of your footprint using 100 servers, with us, on average 40 servers are going to be shut down because they are not necessary anymore,” Gil said.

“These are things you don’t need and don’t use.” That’s definitely the theme of the first stage of GreenOps — and FinOps optimization to boot. Can’t wait to see what the next steps are.

Check back often this week for all things KubeCon+CloudNativeCon Europe 2023. The New Stack will be your eyes and ears on the ground in Amsterdam!

The post Rise of FinOps: CAST AI and Port Illuminate Your Cloud Spend appeared first on The New Stack.

]]>
Otterize: Intent-Based Access Control for Kubernetes and Cloud https://thenewstack.io/otterize-intent-based-access-control-for-kubernetes-and-cloud/ Wed, 19 Apr 2023 16:51:46 +0000 https://thenewstack.io/?p=22705698

AMSTERDAM — As services become more and more distributed, organizations need to control not only who can access them from

The post Otterize: Intent-Based Access Control for Kubernetes and Cloud appeared first on The New Stack.

]]>

AMSTERDAM — As services become more and more distributed, organizations need to control not only who can access them from the outside but how internal services can communicate with each other. Whether it’s protecting sensitive data silos or how the shopping cart service communicates with the customer log-in service, each use case requires different authentication, authorization, and configuration.

Lately, the burden of choice of who can do what sits on the shoulders of the platform engineering team.

“Developers increasingly have a choice of great technologies to address their needs, and so they are adding more and more technologies. But then you get this cacophony of multiple technologies, multiple stacks and multiple services communicating with each other that aren’t normalized in any way,” Otterize CEO and co-founder Tomer Greenwald told The New Stack.

“So you can have four or five teams within the same department working on different stacks, they can use Go and Python and Java, and they may use Kafka, but some use hosted Kafka, some use MSK [managed via Amazon], and some use Confluent Cloud because that’s what’s easy for them. But then there’s a burden that we saw that’s increasing in the platform teams of managing everything.”

Otterize looks at intent-based access control (IBAC) as the solution to ease platform pain. Otterize OSS is an open source secure connectivity management tool built for use within Kubernetes clusters, that is launching today at KubeCon+CloudNativeCon Europe its Otterize Cloud add-on for platform engineering teams. Otterize — sweetly named after the way “authorize” is pronounced with a Hebrew accent, with a cute animal mascot to boot — virtualizes enforcement point management across Kubernetes clusters and now the cloud.

What Is Intent-based Access Control (IBAC)?

The team at Otterize, a recently funded startup based in Tel Aviv and San Francisco, describes intent-based access control or IBAC as a modern, declarative approach to granting access automatically, responsibly, and scalably.

“What we’ve built here is a solution to allow platform engineers and developers to manage secure access for their platforms and systems without having to think about authentication and authorization and everything related to the processes that they have to take to get that secure access,” said Greenwald.

Otterize is unique, he says, as an un-opinionated authentication and authorization solution.

“Otterize doesn’t talk about how to configure stuff or how do you want to secure things. Do you want to use mTLS [mutual transport layer security] or JWTs [JSON web tokens]? Do you want to use Kubernetes network policies or Istio service mesh policies or Kafka ACLs?” Greenwald continued. Instead, developers describe their functional needs in a declarative way. “Let’s say I’m a developer working on the checkout service, and I want to make a secure call to the order service. I simply tell Otterize ‘I want to get access to the order service’. And if I want to connect to a Kafka broker, I simply tell Otterize, ‘I want to connect to the Kafka broker’,” he explained.

Otterize follows a conversational, declarative pattern, with a simple YAML file — which is called an “intents file” within the tool — expressing the needs from the client’s perspective. It doesn’t ask about the identification or security needs, but asks the developers about their functional needs. From there, the intents are committed, reviewed and approved just like source code. Otterize plugs into the CI/CD pipeline to then manage and configure the access controls according to those intents.

“We don’t require you to install a new enforcement point or deploy a new service mesh or use some custom SDK as new enforcement points,” Greenwald said. “We actually believe that most teams have the right enforcement points available already, and all we need to do is connect to them and configure them: configure their service mesh, configure their API gateways, or Kafka brokers, to enforce intended-access-only. The hard problem in most cases isn’t how to control access but rather what access should be granted or not.”

Otterize acts as a virtualization of enforcement points, taking away the need for developers to install and configure anything for authorized enforcement — or ping platform teams asking them to do it for them.

“We allow intended access. We block unintended access. But, most of all, we do it without having to get every developer to understand every type of enforcement point and every type of authentication mechanism,” he said.

This version of Otterize is already released as an open source solution that can be installed on a single Kubernetes cluster, manage network policies, and manage authentication authorization for Kafka using mTLS and Kafka’s built-in access control lists (ACLs).

The information of which client service has access to which servers always lives with the client source code, maintaining consistency as the code evolves, and always keeping least-privilege access to servers.

For KubeCon, Otterize is also releasing its first service mesh integration with Istio, as well as a Kafka visibility capability to show which clients access which topics, as well as what client is consuming and should not.  This improves Otterize’s capabilities to operate in “shadow mode,” showing what would happen if enforcement were turned on, before anything has a chance to break.

The team intends to apply for the Cloud Native Computing Foundation sandbox soon.

Otterize Cloud for Platform Engineering Teams

As Greenwald referenced at the beginning of this piece, developers have a freedom of choice, but that freedom is making the job of platform teams much harder. With that in mind, also today at KubeCon, Otterize announces its new Otterize Cloud, which extends the open source in-cluster enforcement with a global view and insights in the cloud.

This cloud offering allows platform teams more visibility and insights — this would be blocked, this is not protected, etc. — to let them preview what changes will be made, and how that will affect telemetries and alerts. Otterize Cloud allows teams to connect multiple open source installations on different clusters and then puts them behind a single pane of glass.

As Otterize Cloud evolves, Greenwald said, it will support more complex layouts where one Kubernetes calls to another. Previously, this would have required a lot of extra knowledge and configuration by developers or their platform teams, which again Otterize would now abstract away, providing a unified view of the “access plane” of all a company’s services.

This cloud offering allows platform teams more visibility in how changes will be made, and how that will affect telemetries and alerts. The cloud allows teams to connect multiple open source installations on different clusters and then puts them behind a single pane of glass.

This enables simpler transitions, he continued, between different best-of-breed technical alternatives — like transitioning from a hosted to a cloud-managed Kafka, or moving from network policies to Istio, or vice versa. Otterize doesn’t limit choice, it just muffles that cacophony platform teams have to continuously contend with, no longer having to create custom platforms to fit all the different developer demands. Instead, Otterize is normalizing identities, permissions and authentication across an organization.

This single view, visually presented in Otterize Cloud as the “access graph,” also allows platform teams to map out how different services will interact with each other, perhaps using shadow mode to preview the effects of enforcement before activating enforcement. “The access graph allows you to visualize and understand what’s going on when you’re using the different enforcement methods and how they’re going to affect your traffic,” Greenwald said.

This includes whether discovered intents — what calls services are actually making — match declared intents. In particular, it flags if there are services that would now be blocked because no one declared the calls as intended. This allows for mistakes to be caught before they are released.

Check back often this week for all things KubeCon+CloudNativeCon Europe 2023. The New Stack will be your eyes and ears on the ground in Amsterdam!

The post Otterize: Intent-Based Access Control for Kubernetes and Cloud appeared first on The New Stack.

]]>
Cloud Control Planes for All: Implement Internal Platforms with Crossplane https://thenewstack.io/cloud-control-planes-for-all-implement-internal-platforms-with-crossplane/ Thu, 13 Apr 2023 17:00:51 +0000 https://thenewstack.io/?p=22704448

Over the last two decades, the rise of cloud computing has enabled the creation of countless new businesses and revolutionized

The post Cloud Control Planes for All: Implement Internal Platforms with Crossplane appeared first on The New Stack.

]]>

Over the last two decades, the rise of cloud computing has enabled the creation of countless new businesses and revolutionized how existing organizations operate and scale. The value offered by cloud-service providers during this transition cannot be overstated, but there have also been tremendous advantages for the cloud-service providers themselves.

The efficiencies of scale that are distributed to customers are experienced to an even larger degree internally, allowing for these platforms to grow and expand at a rapid rate. With Crossplane, an open source framework for building cloud native control planes, customers can start to realize those efficiencies in their own internal platforms.

The Value of Foundation Services

Most cloud-service providers started out with a few core offerings in storage and compute. Amazon Web Services (AWS) famously began with just S3 before expanding into compute with EC2. These services have and continue to provide immense value to customers who previously would have had to invest a lot of time and money in both managing physical infrastructure and writing software. However, they also serve as a foundation for AWS to rapidly expand the catalog of services they offer.

The reason cloud-service providers are able to launch new services at an increasingly rapid rate is that the new services can be built on the capabilities of the existing infrastructure.

For example, AWS Lambda, a popular serverless platform, was launched at scale because it is built on top of the battle-tested compute offered by EC2. Furthermore, these new services are able to integrate with others offered by the cloud provider, such as databases and storage, and a common identity and access management (IAM) framework could be extended to support the new functionality. The result is a flywheel effect where, in aggregate, the cost of launching the next service diminishes over time.

When a cloud provider has robust compute, network, and storage capabilities at their fingertips, it becomes much easier to implement new services on top of those primitives.

While customers have benefited from this capability in the form of having new services made available to them, they have typically been unable to develop the same competency when building their own internal platforms. The impact is twofold:

  • Customers are subject to the decisions of cloud providers when it comes to what new services are launched. Often these decisions are heavily influenced by the provider’s largest customers.
  • Customers must take on the burden of onboarding and “productionizing” new services, which typically includes extensive testing, documentation and compliance review.

It is rare that an organization of reasonable size exposes raw cloud infrastructure to developers. Instead, they implement some layer of abstraction over the API in order to ensure that only infrastructure that meets internal policies can be provisioned. These policies may enforce certain security, reliability or cost requirements. For example, developers may only have the ability to provision some subset of EC2-instance types in order to avoid incurring high costs.

Unfortunately, every time a new cloud service is introduced into an organization, these policies have to be mapped to that service’s API. While the cloud provider may be offering new products by building on top of its own foundational services, every new service they expose to customers is a net new API to be onboarded in an organization.

This linear trend in overhead of onboarding the next service stands in stark contrast to the logarithmic curve that the cloud providers experience when launching the service.

When every new service offers its own API, the effort invested in vetting previous services does not reduce the effort required to vet the next one.

Ideally, doing the hard work of onboarding a service in an organization would be amortized over the subsequent services it enables. More and more organizations are moving to building internal platforms, rather than just consuming cloud-service primitives. This pattern creates a unique opportunity for these organizations to build their own foundational services, which can subsequently be used to build higher-level services in the future.

However, operating a platform is complex, and cloud providers have done a lot of work to build control planes that power theirs, presenting a single API surface and ensuring that services are resilient to failure. Replicating this amount of engineering effort in most organizations is not feasible. Fortunately, the Crossplane project has taken the capabilities of Kubernetes and constructed a framework that makes building control-plane-based platforms possible for anyone.

Defining Your Platform Foundation

When starting out on their internal platform journey, every organization must determine what APIs will serve as their foundation services. Countless factors could determine where to begin, including size of the company, complexity of the product, and culture of the engineering organization. Starting with a commonly used primitive, such as a database or containerized workload, is usually a safe bet.

In Crossplane, platform APIs are defined using Composition, which allows a platform team to define an API for developers to interact with, and one or more implementations of that API. For example, if developers frequently deploy applications packaged as OCI images, a platform team may define a ContainerizedWorkload API with parameters for an image reference and a set of environment variables.

An implementation of that API could be an AWS Lambda function, an ECS Task Definition or any other set of compute primitives. While these AWS services offer a wide array of customization options, only the parameters that a developer cares about are exposed in the internal platform, allowing the platform team to incorporate any organizational policies, such as tags or memory limits, behind the scenes.

An internal platform codifies policy, allowing organizations to take advantage of the value offered by cloud infrastructure while establishing guardrails for operations that developers are able to perform on their own.

Providing the initial implementations of the ContainerizedWorkload are high-touch activities that require understanding of the cloud-service APIs they rely on. In other words, this is the early part of the logarithmic curve, where there is still significant toil required to add a new API. However, your platform now has its first service that is fully compliant with organizational policy. You’ve defined a foundation service.

Growing with Confidence

In the early stages of building a platform, much of the work will involve building out additional foundation services. Perhaps developers also frequently need to provision databases, so a MySQLInstance API is defined and implemented. However, as the core set of foundation services are rounded out, the cost of adding the next service starts to decrease.

For example, with a workload and database API in place, a StatefulWorkload API can be added to the platform with minimal effort because it relies on foundation services that have organizational policy baked in. In other words: compliance composes.

Validation and compliance are transitive. If a higher-level API is built only on other compliant APIs, then it too is compliant.

Furthermore, implicit in this architecture is the propagation of platform changes and improvements from lower-level services to higher-level ones. New organizational mandates, for anything from SOC2 compliance to switching cloud providers, can be carried out at the foundation level but applied to all levels above.

Additionally, transitioning a higher-level API, such as StatefulWorkload, to rely on cloud services directly instead of internal platform APIs can be carried out at any time because the platform is built by stacking APIs, not by stacking implementations.

Your First Platform in 5 Minutes or Less

Defining a platform is inherently iterative, and Crossplane makes getting started on the journey easy. Grab a Kubernetes cluster from your favorite provider, install Crossplane and pick a QuickStart package to implement your first APIs!

The post Cloud Control Planes for All: Implement Internal Platforms with Crossplane appeared first on The New Stack.

]]>
Why You Should Run Your Platform Team Like a Product Team https://thenewstack.io/why-you-should-run-your-platform-team-like-a-product-team/ Wed, 12 Apr 2023 15:26:54 +0000 https://thenewstack.io/?p=22705062

I’ve been working in software engineering and DevOps for about a decade. For the last five years, I’ve worked on

The post Why You Should Run Your Platform Team Like a Product Team appeared first on The New Stack.

]]>

I’ve been working in software engineering and DevOps for about a decade. For the last five years, I’ve worked on platform teams and with other infrastructure-focused teams that are trying to become platform teams, even if they didn’t realize it at the time. With that in mind, I have a suggestion for how to help ensure platform team success: run them like a product team.

What Is a Platform Team?

Platform teams focus on building and maintaining core systems and workflows for delivering infrastructure and other services to application teams. Since applications run on the foundation of infrastructure, supporting infrastructure is a big job. Not only does platform engineering encompass what you would traditionally think of as “infrastructure” — virtual machines, compute clusters and networking — it also includes all the glue that binds the worlds of applications and infrastructure. These include APIs, monitoring, CI/CD pipelines, credential management and more. All these things centralized under one umbrella comprise the “platform” in “platform team.”

Platform Teams and Modernization

Platform teams address a historical problem with infrastructure provisioning and configuration. Previously, creation, updates and deletion of infrastructure involved manual processes, such as pointing and clicking to create infrastructure or ad hoc scripting. In cloud-based environments, infrastructure is being automated and spun up in a matter of minutes. New features are getting built and deployed to production in a matter of days.

Besides supporting application teams, platform teams also have the responsibility of ensuring that the workflows they build remain compliant with industry and company standards in the face of this acceleration. In the same way that product teams have to gather feedback from their customers and listen to key stakeholders, platform teams need to collaborate with application teams while also meeting requirements from security, compliance, finance, etc.

While this means platform teams need to sometimes keep developer requests from straying outside of those stakeholders’ boundaries, it’s more often the case, in my experience, that many organizations will continue to build legacy compliance processes around manual changes to infrastructure that do not align well with modern automation practices.

To modernize, platform teams must challenge the status quo in a way that’s palatable to key teams inside an organization. Platform teams have to talk with these teams to understand why certain policies are in place and figure out how to automate their systems safely. This diplomacy can be an even bigger challenge than a platform team’s technical responsibilities.

Platform Teams Are Product Teams

To meet that challenge, a platform team should be run like a product team.

Product teams understand they serve a customer and that the customer’s constant feedback is crucial. So, who is the platform team’s customer? Application development teams.

Use Product Management Strategies

With any product, the first step is user research. Platform teams need to establish feedback loops with their users before they start building the platform. They need requirements, scoping and prioritization. They’ll need to version infrastructure components, perform maintenance, build awareness, encourage adoption and communicate news. Platform teams also need to think in terms of “features.” Like any product team, platform teams should have an idea of what features they want to deliver to their customers based on customer feedback and business needs.

For platform teams, the features are often infrastructure-related capabilities for different types of application architectures. For example, a platform team may be working on a containerization capability to enable application teams to push apps to a particular runtime medium and ensure the deployment meets organizational standards. Other common capabilities include serverless, GPU-intensive workloads and frontend apps spanning web and mobile distributions. These capabilities are cloud-agnostic.

Treat the platform-building process like any other software project. Take an iterative approach and never consider the platform “done.” Don’t forget all the battle-tested product management strategies the IT world has refined over the past several decades just because you’re building something internal and not external.

For inspiration, see Elanco’s hybrid multicloud adoption case study, which shows how the animal health vendor gathered user feedback from architects, developers and enterprise teams. It also includes its blueprint for progressing through the analysis, design and automation phases of its platform.

To learn some of the battle-tested product management strategies that have worked for platform teams, watch ThoughtWorks’s Poppy Rowse and Chris Shepherd’s talk “Non-Technical Challenges of Platform Engineering.”

Work with Stakeholders and Get Buy-In

Product teams must also understand the needs of their stakeholders. Stakeholders for the platform team include every other team that is part of the organization’s infrastructure delivery pipeline. These teams may go by different names but include networking, traditional IT, security, identity, finance, risk, compliance and so on. The platform team’s success is inextricably linked to the success of these functional teams, and without them, the funding and support won’t be there when you need it.

The platform team’s main goal is to help developers safely ship software as quickly as possible while meeting the needs of organizational stakeholders. The organization stakeholders are often looking for secure-by-default infrastructure workflows, compliance guardrails, reduction of tickets and inefficiencies and the reduction of costs through the elimination of infrastructure sprawl among other requirements. Many of these areas are blind spots for development teams but are critical to the “safety” element of shipping software.

The key is to tightly integrate representatives from all stakeholder teams into your ongoing communication and requirements-building processes, just as you would in an external product development workflow.

For examples of working with stakeholders, learn from Controlware’s Christian Bergner’s talk, “Building a Cloud Operations Mindset in the Financial Sector: A Diary of Change.” It features two case studies and describes the process of getting every single function around the same table to talk about the platform.

Should Platform Teams Exist?

Platform teams exist to cope with infrastructure’s generational shift from manual, human-driven processes to automated self-service processes. The skill sets required to manage these new processes are drastically different. Platform teams represent a specialization to address subject-matter differences in infrastructure and application development.

The business need to quickly deliver new products and features to customers has never been higher, which puts ever more pressure on application teams. This means infrastructure has to adapt just as quickly as software gets updated. Infrastructure operators have to think like software engineers if they wish to rise to the challenge. Platform teams are an essential part of that transition, blending the worlds of infrastructure and software engineering.

The post Why You Should Run Your Platform Team Like a Product Team appeared first on The New Stack.

]]>
Learn 12 Factor Apps Before Kubernetes https://thenewstack.io/learn-12-factor-apps-before-kubernetes/ Tue, 11 Apr 2023 13:00:08 +0000 https://thenewstack.io/?p=22704884

Have you ever worked at a company where you struggled with containerized apps but couldn’t quite express why? My initial

The post Learn 12 Factor Apps Before Kubernetes appeared first on The New Stack.

]]>

Have you ever worked at a company where you struggled with containerized apps but couldn’t quite express why?

My initial experiences with containers were at a company implementing them in every wrong way imaginable. For example, they ran databases inside a container with no external volumes. You read that right: They wrote the database storage to an aufs file system, which is not designed for long-term storage and is also very slow. When I mentioned this was a terrible idea because we could lose all the data, the answer was, “We are doing snapshots, so we are fine.”

The first apps they put into containers were not much better:

  • They didn’t use environment variables for configuration; instead, they hardcoded configuration and mounted config files.
  • The app died immediately when the database was unavailable; it didn’t wait or retry until it became available.
  • There was terrible log messaging or logs that went to files, not stdout logging.
  • They ran admin processes such as database migrations with a different app.
  • Apps were needlessly stateful.

I solved most of these issues with entry-point scripts, as mentioned by Kelsey Hightower. But that’s a hacky solution to make up for terrible design. I remember coyly asking the developers to redesign their apps to address all these issues, with only my opinion to back me up. So I went online to do some research and found 12 Factor apps, which not only expanded and validated my points, but also gave me an excellent framework to back up my arguments.

Why Learn 12-Factor Apps

The 12-factor app methodology is a set of best practices for building containerized applications. Heroku introduced these practices in 2011, and they have since been widely adopted by software development teams worldwide.

The 12 factors provide a framework for building scalable, portable, maintainable and resilient applications. But perhaps the most important benefit is that they create apps that are easy for operators because they are designed to work seamlessly with any container orchestrator.

Kubernetes (K8s) works best with 12-factor apps because they are the best design practices for containerized applications. K8s, being a container orchestrator, is designed with the assumption that your applications are 12-factor apps.

Venturing into container orchestration without knowing how to engineer container apps will make operating them significantly more tedious to manage and less scalable. Sure, you can make monoliths and poorly designed apps run in Kubernetes. For example, you can mount volumes, run statefulSets and even do vertical autoscaling. But ultimately, you will have to contend with high operational costs:

Factor Operational cost from not implementing
Codebase Apps with shared code base are harder to update and maintain.
Dependencies Time is spent finding and installing dependencies that should be clearly defined and packaged with the container.
Config Time and engineering is spent creating entry-point scripts and/or custom images from source code to change the hard-coded configuration.
Backing Services There are costly and time-consuming migrations and/or significant downtime when changing backing services.
Build, release, run App code and server running being treated as one leads to snowflake servers, painful maintenance and costly upgrades.
Processes Apps cannot horizontally scale when the state is shared. They also cannot be seamlessly replaced when upgrading them.
Port binding This means maintaining a web server container like Tomcat, causing significant configuration overhead and inflated app runtime.
Concurrency Apps not designed with concurrency in mind might use an excessive amount of resources, making a poor choice for scaling.
Disposability This results in data loss and performance issues due to a lack of graceful shutdown implementations and to not handling requests so they can deal with crashes.
Dev/prod parity It is impossible to predict how an app will behave in production. Downtime increases and impairs deployment velocity.
Logs It is tedious to send logs to a log warehouse. Container orchestrators expect logs to use stdout.
Admin Processes Time is wasted procuring a process that’s not part of the app or even doing it manually.

Platform Engineering and 12-Factor Apps

Platform engineering helps deliver excellent self-service and great developer experience with an internal developer platform (IDP). And an IDP significantly reduces cognitive load for developers by providing golden paths and multiple abstractions based on user roles.

iIn platform engineering, 12-factor apps are important because developers self-serve their applications and infrastructure needs with an IDP. Internal developer platforms generally leverage container orchestration and shift the operation of services to developers, meaning that operational excellence is paramount to mitigate all the issues described above.

A platform orchestrator like Humanitec sits at the center of your IDP, making it easy to deploy workloads and all their resources to all environments with a simple workload specification.

Humanitec uses K8s to deploy workloads, therefore, designing 12-factor apps is crucial to maintain high operational performance. When using an IDP, developers self-serve their infrastructure and configuration needs, including deploying and operating applications. If they use non–12-factor apps, they will experience all the pain points described above.

For example, let’s say you have an application that uses a database. Suppose you’re not using a 12-factor app. In that case, you may need to mount configuration on disk, and whatever tooling you use to automate this process is likely designed to work with configuration as variables. If you have multiple environments, you will compound the problem.

Overall, 12-factor apps make deploying, managing and scaling applications easier. They also make it easier to collaborate with other developers.

Conclusion

The 12-factor apps framework provides best practices for building containerized applications that are scalable, portable, maintainable and resilient. They are essential for maintaining high operational excellence when deploying and operating applications in the cloud.

Platform engineering helps developers consume their own infrastructure and easily operate their own services, but the services must be designed with these tools in mind. We encourage all developers to adopt the 12-factor methodology to make their life easier operating them.

Have your 12-factor apps ready? Standardize and deploy them with Humanitec today.

The post Learn 12 Factor Apps Before Kubernetes appeared first on The New Stack.

]]>
A Platform for Kubernetes https://thenewstack.io/a-platform-for-kubernetes/ Wed, 05 Apr 2023 17:00:33 +0000 https://thenewstack.io/?p=22703486

The notion of internal developer platforms (IDPs) is very popular at the present time. Software engineering teams of all types

The post A Platform for Kubernetes appeared first on The New Stack.

]]>

The notion of internal developer platforms (IDPs) is very popular at the present time. Software engineering teams of all types have it at the top of their wishlist. Teams within large organizations are able to devote resources exclusively for the development and maintenance of IDPs, but the demand is ubiquitous.

Based on observations at the KubeCon + CloudNativeCon NA 2022 floor, several leading surveys, and general feedback — we concluded that the Kubernetes community can greatly benefit from platforms that simplify its adoption and day-to-day usage.

Kubernetes Is Powerful but Complex

Since the nature of the problem solved by Kubernetes is a complex one, Kubernetes itself can be difficult to manage. The technology handles many important tasks involved in running an application such as persistent volumes, load balancing, secrets and authentication, service discovery, auto-scaling and others. All have the intent of liberating application developers. Instead, we find that application developers end up being tasked with knowing Kubernetes primitives, which can be a significant cognitive overhead. Over time, so many new features have been built into Kubernetes that it has started to work in myriad ways, which is demonstrated in this excellent talk by Viktor Farcic.

In the past, we have written about how DevOps practices shape over time. Our observation is that ― many small factors add up to bigger outcomes. The use of platforms for software engineering teams is no different. Small, yet significant outcomes for engineers bubble up as important results for product teams. In turn, this leads to meaningful development for the business units, culminating in success for the company as a whole. For those in the business of delivering software products to their customers, this paradigm can be illustrated using the pyramid below.

pyramid of blocks

What Should Teams Shoot for?

Individual software teams need to be equipped with the right kind of tools and processes in order to be able to deliver the desired results. Both velocity and accuracy are paramount in order to release functionally. Defining the right kind of Sservice Level Objectives (SLOs) is critical in meeting availability and reliability requirements. SLOs can take many forms. It can be defined in terms of a percentage for uptime, in units of seconds or milliseconds when defining (un)acceptable latency or by volume when defined for throughput. In the event that SLOs are not met, engineers should have sufficient knowledge and transparency to be able to investigate and ascertain the root cause. Teams should be able to collaborate with each other and work together to inspect problematic areas, especially those that are cross-functional.

Upon such a foundation of sound engineering practices, value addition can happen downstream. High-quality software can be engineered and delivered by teams designed to be dynamic and agile. Teams should be able to remove opacity in all areas of malformed operations, including application errors, security misconfiguration and vulnerabilities — in addition to latency and reliability issues on production.

Armed with these capabilities, engineering teams can promise faster development cycles. By consuming environments and frameworks that are standardized.

Platforms Are Everywhere

Platforms are not limited to cloud computing. They can be used to improve the workflow of developers involved in all kinds of stacks. For example,  Blockchain developers have options such as Avalanche. Avalanche provides primitives that allow developers to then build decentralized apps utilizing exchanges, platforms or contracts as needed. Similarly, Fermyon is positioned as a platform for WASM (WebAssembly). With just a few commands, a developer can spin up a WASM application and deploy it. Their combination of CLI and web UI allow developers and operators to work together on the platform to manage applications. Another example of a platform for WASM is Cosmonic.

Overall, a platform is intended to help organizations to build better software faster, with improved collaboration, quality and scalability.

A Platform for Kubernetes

Kubernetes is an emerging technology that can greatly benefit from having a paved path to production associated with it. Using a platform, application developers can simplify the application deployment process by being able to deploy to any remote instance without having to engage in manual configuration for various clusters.

A platform consumed as a PaaS tool is also of great value when services have to be integrated with applications, especially when working with Kubernetes. Built-in support for databases, messaging and monitoring saves enormous amounts of time and liberate developers from significant toil. It also improves the production parity of applications by making the same grade of services available on all remote instances. This increased level of automation is useful in accurate deployments and can help scale applications on demand.

Conclusion

We believe that the Kubernetes community can greatly benefit from having platforms built to abstract away some inherent complexity. A definitive set of tools, processes, and services that surround Kubernetes clusters in order to establish developer self-service is key in helping improve the adoption of Kubernetes. With this, software engineering teams will be able to build, test and deploy their applications with efficiency and effectiveness.

The Cloud Foundry Foundation will be sponsoring KubeCon + CloudNativeCon Europe 2023 this April in Amsterdam. We’ll be hosting a booth with our contributors present to discuss platforming on Kubernetes. We invite you to please stop by for a chat and check out Korifi — The Cloud Native Platform For Cloud Foundry.

The post A Platform for Kubernetes appeared first on The New Stack.

]]>
Why Create a Developer-Focused Kubernetes Platform and How https://thenewstack.io/why-create-a-developer-focused-kubernetes-platform-and-how/ Tue, 04 Apr 2023 13:50:07 +0000 https://thenewstack.io/?p=22704363

Within the growing cloud native development space, consensus has emerged around Kubernetes as the de facto standard for organizations. Many

The post Why Create a Developer-Focused Kubernetes Platform and How appeared first on The New Stack.

]]>

Within the growing cloud native development space, consensus has emerged around Kubernetes as the de facto standard for organizations. Many leaders in companies at the forefront of cloud native development in production confirm that Kubernetes is mostly a common orchestration framework for now. Kubernetes, in other words, is not “the hard part.” Instead, it’s “all the other stuff that’s hard: linking everything together and making it work.”

Against a backdrop of developer-first Kubernetes as something of a standard, platform engineering and the developer platform concept has become a kind of Holy Grail for organizations seeking a way to make the cloud native journey smoother for developers at every level of experience. There’s no one tried-and-true developer-focused Kubernetes platform, which is the challenge and the beauty of the whole idea. The tooling landscape is vast, with organizations putting together their own developer platforms to pick and choose what works best for their needs. However, this can also lead to choice paralysis and false starts, slowing things down and negating the speed benefits going cloud native was meant to introduce.

With this in mind, let’s look at why creating a developer-focused Kubernetes platform makes sense and then explore what components you need to build it.

Developer-First Platforms: Overcoming K8s Complexity

No one denies that the cloud native paradigm introduces new complexity into the development mix. Suddenly the silos of monolithic development have broken down, and the more linear development workflow — code and hand off to someone else — shifts to a decentralized microservices architecture. Now there are different dependencies and new demands, such as the shift-left idea of “you build it, you run it” that has accompanied the cloud native revolution. But in how much depth does a developer really need to know Kubernetes? Does the developer need to be responsible for the full software life cycle or just understand critical aspects of it to facilitate speed of delivery and faster feedback loops?

These questions underpin the discussion on developer platforms, which have been the focus of much debate in recent months. Whether these platforms eliminate frustration, improve productivity or help standardize workflows, one takeaway from the development community (and their colleagues in DevOps and platform engineering) is that a developer-first platform should reduce complexity with standardization but make room for exceptions.

How to Create a Developer-Focused Kubernetes Platform

There is no one-size-fits-all developer platform, making room for paving a path that achieves exactly what an individual organization needs for its developers. What do developers need to know to ship software safely, and how will a platform get them there?

Some key considerations for what development teams need from an efficient developer-focused Kubernetes platform include:

  • Support for developer self-service and visibility. Create a baseline for the tools and visibility required to do the job and build in flexibility to ensure no one is locked in. A self-service baseline allows everyone to work from the same set of standards independently and makes it easy to get new developers onboarded faster to facilitate reliable, efficient production.
  • Make it simple. Yes, developers need to understand Kubernetes, but only to the degree that this understanding contributes to shipping software and getting critical feedback faster. Mobilize simplicity and clarity by giving developers enough capabilities to go fast without worrying about important — but for them, secondary — considerations, such as infrastructure and security.
  • Unleash the power of DevOps/PlatformOps. DevOps, PlatformOps and site-reliability engineering teams are more than just firefighters and the first line of support. Creating a developer-focused self-service platform enables them to focus on strategic projects and support continuous improvement.
  • Centralize key data. Teams need real-time data to gather insight to understand what’s working and what isn’t. Whether measuring performance or tracking compliance, a centralized platform enables the collection of operational and resource data.
  • Focus on faster feedback. The current macroeconomic moment heightens the need for doing things faster and more efficiently. A good developer platform should deliver fast feedback so no one is sitting around waiting for builds to happen.
  • Facilitate the developer experience. Anything that can reduce cognitive load and keep developers focused on their core tasks, particularly when market conditions are challenging, is a good investment.

While all of this provides background and reasoning for why a developer-focused Kubernetes platform makes sense, what components will get them there?

The Core Components of a Developer Platform

Though Kubernetes has existed for years, it still poses challenges, and developer platforms have been crafted to tackle those challenges and provide some relief to ops teams. These platforms aim to answer questions like: How do you build a productive developer workflow on Kubernetes? How do you create development environments, integrating your source control system, development environment and your deployment system? How do you effectively collaborate with increasingly complex workflows?

Every platform will look a bit different, but ultimately you want to select components that help you code, ship and run your service.

Code Components

First, a platform will require setting up, configuring (and maintaining) a local development environment (and test environment) that mimics a production environment as closely as possible. Source control and CI/CD functionality also should be in place. Ideally, you would set up your platform to enable developers to code locally but test the code against remote services and data stores (that is, remote-to-local, or “remocal,” development).

The dev environment and workflows created in this step contribute to the developer productivity every organization is after, which is why getting this part of the platform right is critical.

Ship Components

Much of the discussion in the cloud native space about getting developers to “shift left” involves enabling developers to take on end-to-end code-ship-run responsibilities. This isn’t always feasible or desirable — it depends on the organization and its cloud native maturity level. But the underlying idea of giving developers more insight into and responsibility for quality-focused tasks throughout the software life cycle does mean that developers are more involved in phases after their code moves into the ship phase. Many organizations have invested in developer platforms to create frictionless paths into these new elements to developer responsibility, paring it down to the “need-to-know” essentials, abstracting it from complex functions that developers don’t care about (or don’t need to) or interact with.

Visibility is one of the must-haves for Kubernetes developers. One of the challenges of distributed services is understanding which versions are deployed and released and all the dependencies that make a developer’s application work. Naturally, this is complex, and visibility will clear the path. At the same time, the platform should provide easy ways to test new versions, using tools like canary releases, so that services can be released incrementally and safely to end users but rolled back quickly if and when something goes wrong.

Run Components

When running your services, your platform should include traffic management capabilities, such as load balancing and rate limiting, to ensure the availability and scalability of microservices. But no matter what traffic management policies you create and despite the best testing efforts and the number of canary releases, something that can go wrong probably will. Your platform should be designed to handle these hiccups, enabling rapid mitigation and response to production incidents as part of your workflow.

And of course, you will want observability to monitor and detect anomalous behavior to more easily and quickly identify concerns.

Real-World Kubernetes, Platforms and the Developer Experience

The developer platform has become a front-and-center way to shape and ease the developer experience from Day 1. The foundation of a good developer platform, regardless of which tools and processes it contains, is about guiding developers in your organization to what they need to know, see and routinely do to be successful.

The post Why Create a Developer-Focused Kubernetes Platform and How appeared first on The New Stack.

]]>
How PaymentWorks Modernized Its Developer Platform with Slim.AI https://thenewstack.io/how-paymentworks-modernized-its-developer-platform-with-slim-ai/ Fri, 31 Mar 2023 18:46:50 +0000 https://thenewstack.io/?p=22704139

PaymentWorks is an enterprise payments-security platform with a mission to mitigate the risks inherent in business-to-business payments. By automating a

The post How PaymentWorks Modernized Its Developer Platform with Slim.AI appeared first on The New Stack.

]]>

PaymentWorks is an enterprise payments-security platform with a mission to mitigate the risks inherent in business-to-business payments. By automating a complex, manual, people-intensive and error-prone payment process, PaymentWorks protects organizations from business payments fraud and ensures regulatory compliance.

Shipping new features quickly while meeting strict security and compliance requirements has always been critical to its success, but its engineers were charged with a new, business-critical software development project with tight deadlines and keen interest from executives, customers and investors.

The engineering team decided to build the new service as a containerized microservice to proactively break apart their existing monolithic, VM-based infrastructure. Doing so would allow them to shorten development cycles, reduce complexity and make developer onboarding easier. They needed to ensure quality, compliance and consistency in delivering production-ready containers as the new “shippable unit of software.” Furthermore, they needed a scalable developer experience that enabled fast onboarding as their business grew and they added more developers to the team.

With this new project, there were goals to streamline the software delivery pipeline:

  • Create secure, production-ready containers automatically as part of the release process that reduce vulnerabilities and attack surface.
  • Remove manual, repetitive activities, such as ad hoc scripting and manual bookkeeping around architecture decisions, that result in mundane work, human error and delays.
  • Fit the container hardening process into existing developer workflows to ease adoption and improve velocity.
  • Reduce DevOps overhead while getting containers from dev to prod.

PaymentWorks also had a goal of having full visibility, history and control over the exact versions of the artifacts running in production, such as the locations of where specific images run, the dependencies (packages) and where versions (git commits) of the application code are contained. Due to the confidential nature of financial services, the new system needed to be secure and easily auditable.

Implementing this new system would help the company scale more efficiently to meet future needs and shorten test cycles, challenges it faced with its legacy build-and-test pipeline based on EC2 (Amazon Elastic Compute Cloud) instances.

The PaymentWorks engineering team knew that providing developers with the right tools and automation would ensure best practices were met as they transitioned to a containerized platform.

They called on Slim.AI, which offers solutions that automatically optimize the composition and construction of containerized applications to create hardened containers, improve build and deploy cycles, and provide deep insight into application behavior and performance.

The teams began by integrating Slim.AI into the PaymentWorks Jenkins-based CI pipeline, recording containers and related artifacts, and storing them as “Collections” on Slim’s web-based SaaS platform. Jenkins communicates with the Slim.AI platform via API, determining the necessary build details and context to generate properly composed and tagged containers.

During this process, Slim.AI catalogs a record of all build context, container versions, tags, attributes and history. Once containers are built, Slim.AI automatically generates a new environment-specific Docker Compose file that pins the latest versions to semantic references and commits that to GitHub for the specified deployment context (dev, test or production).

The PaymentWorks CD system then references the Slim.AI Collections API to find the containers, Docker Compose files and metadata necessary to deploy and run the system. AWS Elastic Container Registry houses the containers, and Slim.AI automatically keeps track of all container versions for each collection, runs vulnerability scanners, hardens containers and generates artifacts for compliance.

Developers can interface with Slim.AI via a web-based UI or command-line interface to review artifacts, see inside their recently built containers, and debug and secure their containers in a developer-friendly way. PaymentWorks’ first container to ship reduced the total vulnerability count by 89%, eliminating all critical, and 60% of high-risk findings.

This integrated container management system allows other aspects of the PaymentWorks environment to interact with Slim.AI programmatically with semantic precision, speed and high automation, whether that’s infrastructure running on AWS or security and compliance audits.

The best part is that it’s completely self-service and hands-off: no friction, manual steps or scripts. According to Matias Elgart, vice president for platform engineering at PaymentWorks, the system has created a way to easily onboard to the new microservices environment as the dev team continues to grow, and developers love that they don’t have to manually manage and share files and scripts locally.

By working with Slim.AI, PaymentWorks has reached its combined goals of improving security and compliance while reducing operational complexity, error and repetitive DevOps cycles, freeing up their teams to focus on building great software for their customers and creating a positive developer experience.

The post How PaymentWorks Modernized Its Developer Platform with Slim.AI appeared first on The New Stack.

]]>
What Is Platform Engineering and When Should You Invest in It? https://thenewstack.io/what-is-platform-engineering-and-when-should-you-invest-in-it/ Thu, 30 Mar 2023 15:57:01 +0000 https://thenewstack.io/?p=22703990

As application platforms grow larger, the idea of DevOps teams where developers support the software development life cycle, but also

The post What Is Platform Engineering and When Should You Invest in It? appeared first on The New Stack.

]]>

As application platforms grow larger, the idea of DevOps teams where developers support the software development life cycle, but also manage infrastructure and the platform, is beginning to reach the limits of what these teams can support. Rather than making their best application developers work on infrastructure problems, more organizations are realizing that a centralized platform team that specializes in that area is a better use of their developers’ skills.

But what exactly is a platform engineering team, and how is it different from the DevOps team? Should your organization invest in platform engineering? Let’s take a closer look.

Platform Engineering: What Is It and How Did It Come About?

Platform engineering is essentially building (selecting/standardizing on), operating and managing the infrastructure that supports first- and third-party applications. In the days before cloud native application development, a central team provided compute infrastructure for enterprise developers to build and host their applications.

At a certain point, those developers moved to a microservices-based architecture. They didn’t just need virtual machines or servers where they could run their applications. They were building those applications in containers, using Docker for their development infrastructure, and then using some form of orchestrator to run their applications. What they needed was a development platform for containerized application hosting.

By that time, Kubernetes had become the dominant container platform for orchestration. There were others, like D2iq and Docker, but Kubernetes ultimately won the race. The central teams started providing a development platform that included the Kubernetes orchestrator and everything else enterprise developers would need to build and host their applications.

Today, I’ve seen organizations include things like the repo structure (registries) and any kind of application operations infrastructure (build infrastructure, monitoring infrastructure, data stores, logging infrastructure, etc.) in that platform.

I’ve also seen organizations use application-layer infrastructure as part of their platform, including CI/CD pipelines and service mesh. This sometimes includes capabilities around blue-green deployments and canary deployments included as part of the platform. But at its core, the platform is compute infrastructure that’s aligned with a microservices architecture in a true multitenant fashion. That’s what platform teams are responsible for managing.

Platform Engineering vs. DevOps

In many cases, platform engineering is an overlay to DevOps. While DevOps teams tend to work more closely with application teams to understand their operational requirements, platform engineering’s charter becomes standardizing on solutions and infrastructure to support the entire estate (such as choosing a consistent data plane). Overall, platform engineering simplifies operations and reduces overhead to standardize on infrastructure components.

With organizations including so many types of infrastructure as part of their platforms, it makes sense for a centralized team to manage it all. If you have hundreds of developers in your organization, it doesn’t make sense to have them all doing the same set of things.

It would be an extreme waste of resources if every Dev team had to manage their own logging, registry and operating infrastructure. More importantly, it is difficult to implement a corporatewide regulatory framework for security across the platform if every team is doing their own thing. You need a centralized platform, and a specialized team — the platform engineering team — to manage it.

If we look at DevOps as a setup, you can never expect a true full-stack software developer to also own systems and infrastructure — this requires a different skill set, mindset and approach. It’s not realistic to have software developers who are writing application code also write platform code or platform automation scripts and manage operations.

When I have seen DevOps teams owning everything end-to-end, there has always been a specific person on the team with operational skills who was owning the Ops aspect of the DevOps. In all other cases, it just doesn’t work.

When to Invest in Platform Engineering

If your organization is entirely in the cloud, and your Dev teams are running their applications on managed Kubernetes services, then a large part of your compute infrastructure and platform is managed by the service provider. In this case, you wouldn’t necessarily need to centralize that function, but there would be bits and pieces of that platform that you would need to run, operate and manage. That’s where it would be useful to invest in platform engineering.

For example, let’s look at logging across all Amazon Elastic Kubernetes Service (EKS) and Azure Kubernetes Service (AKS) clusters in an organization. If you have teams that are building their own Kubernetes clusters through open source distros, you would absolutely want to centralize that function. Why? Primarily because managing, building and operating these different distros is not straightforward; it requires a specific set of skills, and there is quite a learning curve. You certainly wouldn’t want every Dev team in your organization to have to overcome that learning curve.

Also, without a centralized logging infrastructure and a centralized team to oversee it, you run the risk of inefficiency as multiple teams work toward disparate solutions to the same problem. Last, having a platform engineering team in this case would enable application developers to focus on innovation and delivering new features that add value for the business.

An even more important reason to invest in platform engineering is that if you are building everything yourself, you are responsible for managing security, compliance and uptime for your infrastructure. There are specific requirements and playbooks for operating OS distros securely; you don’t want to have to do this work N number of times with every single team in your organization. This is when you want to invest in a platform engineering team.

The main takeaway here is that even if your organization is entirely in the cloud, at some point you will likely want to invest in platform engineering because when you have hundreds of developers building and running applications on cloud-based, managed Kubernetes services, it doesn’t really matter how small the platform component is.

When you take into account that the work needs to be distributed across 10, 20, or 30 or more Dev teams (depending on the size of your organization), you realize the inefficiency of it all. It makes more sense to centralize that function and have one team responsible for platform engineering.

To learn more about new cloud native approaches for establishing security and observability for containers and Kubernetes, check out this O’Reilly eBook by Tigera.

The post What Is Platform Engineering and When Should You Invest in It? appeared first on The New Stack.

]]>
Why Successful Platform Engineering Teams Need a Product Manager  https://thenewstack.io/why-successful-platform-engineering-teams-need-a-product-manager/ Wed, 29 Mar 2023 15:25:25 +0000 https://thenewstack.io/?p=22703792

We’re all aware platform engineering isn’t new. It’s the formal approaches and team structures, which have come into focus in

The post Why Successful Platform Engineering Teams Need a Product Manager  appeared first on The New Stack.

]]>

We’re all aware platform engineering isn’t new. It’s the formal approaches and team structures, which have come into focus in recent years, that are justifying headlines. And at every juncture, organizations trying these new approaches hear a common refrain: “It’s not a project — it’s a product.”

To be successful with platform engineering, which is something I’ve researched at Puppet and have experienced through my work with NatWest, I believe it’s imperative to have a product manager and that we should continue to trumpet the “product” refrain. Here’s why.

Platform Engineering Is Not a One-and-Done Activity

Most organizations approach toolchains and workflows with a project mindset, meaning that they’re looking to deliver a set final goal, potentially with a temporary team. Conversely, a product approach recognizes the platform is an evolving product that responds to feedback from customers to adapt the platform to meet their ongoing requirements.

Platform engineering needs clarity of purpose to work. It’s an ongoing function, not an end state. That clarity comes from someone who approaches this in much the same way a product manager approaches their product.

There’s a dedication and skillset needed to evangelize the platform, justify changes to the platform and inform stakeholders and users how and why things are changing (as well as why they matter and what value they provide to the end user). Platform product management is much more than a “check the timeline and click the boxes” approach.

The End User Is Always Top Priority in Product Management

Product management helps keep platform engineering focused on solving users’ needs. It helps reduce waste and accelerates the delivery of new technology by creating and supporting an organization-wide platform that people actually want to use. This is key.

In platform engineering, you can’t just build it and then expect them to come — you must build something that makes the end users’ lives easier and makes them want to change their behavior.

One of the clearest contrasts that can be seen is in the operating system (OS) build and provisioning platforms. There, a Windows or Linux team would provide a system to build and configure virtual machines, but only aimed to provide the features required directly for their own team to meet their written build standards and approach. This would fail to prioritize customer-driven requirements such as new OS releases, self-service APIs and integrations with other tooling, since these sorts of deliverables would not directly affect the OS build team itself but are central to platform engineering.

Platform engineering product managers are responsible for monitoring how the platform is running, communicating with users to avoid and address issues, and ensuring that the platform roadmap is up to date and being presented to consumers in a way they can actually use.

The exact role of a product manager in platform engineering might be shaped to match the platform they support, but it’s always focused on treating the platform as a product, with ease-of-use front and center.

We Need to Invest More in Product Management

Today, product managers are present in only about a third of the platform teams surveyed in “The State of DevOps 2023” report. Of course, investing in product managers can mean hiring for the skill, but existing staff make great candidates for platform product management. They already know your organization’s culture and processes, which greases the wheels of communication and enables trust right out of the box.

Either way, investing in product managers for your platform means more than getting them in the door. Product management can enhance the careers of the people who choose to do it — it’s not a short-term decision or part-time job. Supporting them includes training and giving them the time and space to grow with the platform and become a trusted conduit of user feedback.

To learn more about platform engineering, check out the “State of DevOps Report, Platform Engineering edition,” on the Puppet website.

The post Why Successful Platform Engineering Teams Need a Product Manager  appeared first on The New Stack.

]]>