{"id":22710277,"date":"2023-06-13T06:00:28","date_gmt":"2023-06-13T13:00:28","guid":{"rendered":"https:\/\/thenewstack.io\/?p=22710277"},"modified":"2023-06-13T11:19:54","modified_gmt":"2023-06-13T18:19:54","slug":"amazon-prime-videos-microservices-move-doesnt-lead-to-a-monolith-after-all","status":"publish","type":"post","link":"https:\/\/thenewstack.io\/amazon-prime-videos-microservices-move-doesnt-lead-to-a-monolith-after-all\/","title":{"rendered":"Amazon Prime Video\u2019s Microservices Move Doesn\u2019t Lead to a Monolith after All"},"content":{"rendered":"\n
In any organizational structure, once you break down regular jobs into overly granularized tasks and delegate them to too many individuals, their messaging soon becomes unmanageable, and the organization stops growing.<\/p>\n
Last March 22, in a blog post that went unnoticed for several weeks, Amazon Prime Video’s engineers reported<\/a> the service quality monitoring application they had originally built to determine quality-of-service (QoS) levels for streaming videos — an application they built on a microservices platform — was failing, even at levels below 10 percent of service capacity.<\/p>\n What’s more, they had already applied a remedy<\/a>: a solution their post described as “a monolith application.”<\/p>\n The change came at least five years after Prime Video — home of on-demand favorites such as “Game of Thrones” and “The Marvelous Mrs. Maisel” — successfully outbid traditional broadcast outlets for the live-streaming rights to carry NFL Thursday Night Football<\/a>.<\/p>\n One of the leaders in on-demand streaming now found itself in the broadcasting business, serving an average 16.6 million real-time viewers<\/a> simultaneously. To keep up with live sports viewers’ expectations of their “networks” — in this case, CBS, NBC, or Fox — Prime Video’s evolution needed to accelerate.<\/p>\n It wasn’t happening. When the 2022 football season kicked off last September, too many of Prime Video’s tweets were prefaced with the phrase, “We’re sorry for the inconvenience.<\/a>”<\/p>\n Prime Video engineers overcame these glitches, the engineers’ blog reported, by consolidating QoS monitoring operations that had been separated into isolated AWS Step Functions and Lambda functions, into a unified code module.<\/p>\n As initially reported, their results appeared to finally confirm many organizations’ suspicions, well-articulated over the last decade<\/a>, that the costs incurred in maintaining system complexity and messaging overhead inevitably outweighed any benefits to be realized from having adopted microservices architecture.<\/p>\n Once that blog post awakened from its dormancy, several experts declared all of microservices architecture dead. “It’s clear that in practice, microservices pose perhaps the biggest siren song for needlessly complicating your system,” wrote Ruby on Rails creator David Heinemeier Hansson<\/a>. “Are we seeing a resurgence of the majestic monolith?” asked .NET MVP Milan Jovanović on Twitter<\/a>. “I hope so.”<\/p>\n “That’s great news for Amazon because it will save a ton of money,” declared Jeff Delaney on his enormously popular YouTube channel Fireship<\/a>, “but bad news for Amazon because it just lost a great revenue source.”<\/p>\n Yet there were other experts, including CodeOpinion.com’s Derek Comartin<\/a>, who compared Prime’s “before” and “after” architectural diagrams with one another, and noticed some glaring disconnects between those diagrams and their accompanying narrative.<\/p>\n As world-class experts speaking with the New Stack also noticed, and as a high-ranking Amazon Web Services engineer finally confirmed for us, the solution Prime Video adopted not only fails to fit the profile of a monolithic application. In every respect that truly matters, including scalability and functionality, it is a more evolved microservice than what Prime Video had before.<\/p>\n “This definitely isn’t a microservices-to-monolith story,” remarked Adrian Cockcroft<\/a>, the former vice president of cloud architecture strategy at AWS, now an advisor for Nubank<\/a>, in an interview with The New Stack. “It’s a Step Functions-to-microservices story. And I think one of the problems is the wrong labeling.”<\/p>\n Cockcroft, as many regular New Stack readers will be familiar, is one of microservices architecture’s originators<\/a>, and certainly its most outspoken champion<\/a>. He has not been directly involved with Prime Video or AWS since becoming an advisor, but he’s familiar with what actually happened there, and he was an AWS executive when Prime’s stream quality monitoring project began. He described for us a kind of prototyping strategy where an organization utilizes AWS Step Functions<\/a>, coupled with serverless orchestration, for visually modeling business processes.<\/p>\n With this adoption strategy, an architect can reorganize digital processes essentially at will, eventually discovering their best alignment with business processes. He’s intimately familiar with this methodology because it’s part of AWS’ best practices — advice which he himself co-authored. Speaking with us, Cockcroft praised the Prime Video team for having followed that advice.<\/p>\n As Cockcroft understands it, Step Functions was never intended to run processes at the scale of live NFL sports events. It’s not<\/em> a staging system for processes whose eventual, production-ready state would need to become more algorithmic, more efficient, more consolidated<\/em>. So the trick to making the Step Functions model workable for more than just prototyping is not just to make the model somewhat scalable, but also transitional.<\/p>\n “If you know you’re going to eventually do it at some scale,” said Cockcroft, “you may build it differently in the first place. So the question is, do you know how to do the thing, and do you know the scale you’re going to run it at? Those are two separate cases. If you don’t know either of those, or if you know it’s small-scale, complex, and you’re not exactly sure how it’s going to be built, then you want to build a prototype that’s going to be very fast to build.”<\/p>\n However, he suggested, if an organization knows from the outset its application will be very widely deployed and highly scalable, it should optimize for that situation by investing in more development time up-front. The Prime Video team did not have that luxury. In that case, Cockcroft said, the team was following best practices: building the best system they could, to accomplish the business objectives as they interpreted them at the time.<\/p>\n “A lot of workloads cost more to build than to run,” Cockcroft explained. “[For<\/em>] a lot of internal corporate IT workloads, lots of things that are relatively small-scale, if you’re spending more on the developers than you are on the execution, then you want to optimize for saving developer time by building it super-quickly. And I think the first version… was optimized that way; it wasn’t intended to run at scale.”<\/p>\n As any Step Functions-based system becomes refined, according to those same best practices, the next stage of its evolution will be transitional. Part of that metamorphosis may involve, contrary to popular notions, service consolidation. Despite how Prime Video’s blog post described it, the result of consolidation is not a monolith. It’s now a fully-fledged microservice, capable of delivering those 90% cost reductions engineers touted.<\/p>\n “This is an independently scalable chunk of the overall Prime Video workload,” described Cockcroft. “If they’re not running a live stream at the moment, it would scale down or turn off — which is one reason to build it with Step Functions and Lambda functions to start with. And if there’s a live stream running, it scales up. That’s a microservice. The rest of Prime Video scales independently.”<\/p>\n The New Stack spoke with Ajay Nair<\/a>, AWS’ general manager for Lambda and for its managed container service App Runner<\/a>. Nair confirmed Cockcroft’s account in its entirety for how the project was initially framed in Step Functions, as well as how it ended up a scalable microservice.<\/p>\n Nair outlined for us a typical microservices development pattern. Here, the original application’s business processes may be too rigidly coupled together to allow for evolution and adaptation. So they’re decoupled and isolated. This decomposition enables developers to define the contracts<\/em> that spell out each service’s expected inputs and outputs, requirements and outcomes. For the first time, business teams can directly observe the transactional activities that, in the application’s prior incarnations, had been entirely obscured by its complexity and unintended design constraints.<\/p>\n From there, Nair went on, software engineers may codify the isolated serverless functions as services. In so doing, they may further decompose some services — as AWS did for Amazon S3, which is now served by over 300 microservice classes. They may also consolidate other services. One possible reason: Observing their behavior may reveal they actually did not need to be scaled independently after all.<\/p>\n “It is a natural evolution of any architecture where services that are built get consolidated and redistributed,” said Nair. “The resulting capability still has a well-established contract, [and<\/em>] has a single team managing and deploying it. So it technically meets the definition of a microservice.”<\/p>\n “I think the definition of a microservice is not necessarily crisp,” stated Brendan Burns<\/a>, the co-creator of Kubernetes, now corporate vice president at Microsoft, in a note to The New Stack.<\/p>\n “I tend to think of it more in terms of capabilities around functionality, scaling, and team size,” Burns continued. “A microservice should be a consistent function or functions — this is like good object-oriented design. If your microservice is the CatAndDog()<\/strong> service, you might want to consider breaking that into Cat()<\/strong> and Dog()<\/strong> services. But if your microservice is ThatOneCatOnMyBlock()<\/strong>, it might be a sign that you have broken things down too far.”<\/p>\n “The level of granularity that you decompose to,” explained F5 Networks Distinguished Engineer Lori MacVittie<\/a>, speaking with The New Stack, “is still limited by the laws of physics, by network speed, by how much [code<\/em>] you’re actually wrapping around. Could you do it? Could you do everything as functions inside a containerized environment, and make it work? Yes. It’d be slow as heck. People would not use it.”<\/p>\n Adrian Cockcroft advises that the interpretability of each service’s core purpose, even by a non-developer, should be a tenet of microservice architecture itself. That fact alone should mitigate against poor design choices.<\/p>\n “It should be simple enough for one person to understand how it works,” Cockcroft advocated. “There are lots of definitions of microservices, but basically, you’ve partitioned your problem into multiple, independent chunks that are scaled independently.”<\/p>\n “Everything we’re describing,” remarked F5’s MacVittie, “is just SOA without the standards… We’re doing the same thing; it’s the same pattern. You can take a look at the frameworks, objects, and hierarchies, and you’d be like, ‘This is not that much different than what we’ve been doing since we started this.’ We can argue about that. Who wins? Does it matter? Is Amazon going to say, ‘You’re right, that’s a big microservice, thank you?’ Does it change anything? No. They have solved a problem that they had, by changing how they design things. If they happen to stumble on what they should have been doing in the first place, according to the experts on the Internet, great. It worked for them. They’re saving money, and they did expose one of those problems with decomposing something too far, on a set of networks on the Internet that is not designed to handle it yet.<\/p>\n “We are kinda stuck by physics, right?” she continued. “We’re unlikely to get any faster than we are right now, so we have to work around that.”<\/p>\n Perhaps you’ve noticed: Enterprise technology stories thrive on dichotomy. For any software architecture to be introduced to the reader as something of value, vendors and journalists frame it in opposition to some other architecture<\/a>. When an equivalent system or methodology doesn’t yet exist, the new architecture may end up being portrayed as the harbinger of a revolution that overturns tradition.<\/p>\n One reason may be because the discussion online is being led either by vendors, or by journalists who tend to speak with vendors first.<\/p>\n “There is this ongoing disconnect between how software companies operate, and how the rest of the world operates,” remarked Platify Insights analyst Donnie Berkholz<\/a>. “In a software company, you’ve got ten times the staffing and software engineering on a per capita basis across the company, as you do in many other companies. That gives you a lot of capacity and talent to do things that other people can’t keep up with.”<\/p>\n Maybe the big blazing “Amazon” brand obscured the fact — despite the business units’ proximity to one another — that Prime Video was a customer of AWS. With its engineers’ blog post, Prime joined an ongoing narrative that may have already spun out of control. Certain writers may have focused so intently upon selected facets of microservices architecture, that they let readers draw their own conclusions about what the alternatives to that architecture must look like. If microservices were, by definition, small<\/a> (an aspect that one journalist in particular was guilty as hell of over-emphasizing<\/a>), its evil counterpart must be big, or bigness itself.<\/p>\n Subsequently, in a similar confusion of scale, if Amazon Prime Video embraces a monolith, so must all of Amazon. Score one come-from-behind touchdown for monoliths in the fourth quarter, and cue the Thursday Night Football theme<\/a>.<\/p>\n “We’ve seen the same thing happening over and over across the years,” mentioned Berkholz. “The leading-edge software companies, web companies, and startups encounter a problem because they’re operating at a different scale than most other companies. And a few years later, that problem starts to hit the masses.”<\/p>\n The original “axis of evil” in the service-orientation dichotomy was 1999’s Big Ball of Mud<\/a>. First put forth by Professors Brian Foote and Joseph Yoder of the University of Illinois at Urbana-Champaign, the Big Ball helped catalyze a resurgence in support for distributed systems architecture. It was seated at the discussion table where the monolith sits now, but not for the same reasons.<\/p>\n The Big Ball wasn’t a daunting tower of rigid, inflexible, tightly-coupled processes, but rather programs haphazardly heaped onto other programs, with data exchanged between them by means of file dumps onto floppy disks carried down office staircases in cardboard boxes. Amid the digital chaos of the 1990s and early 2000s, anything definable as not<\/em> a Big Ball of Mud, was already halfway beautiful.<\/p>\n “Service Oriented Architecture was actually the same idea as microservices,” recalls Forrester senior analyst David Mooter<\/a>. “The idea was, you create services that align with your business capabilities and your business operating model. Most organizations, what they heard was, ‘Just put stuff [places<\/em>] and do a Web service,’ [the result being<\/em>] you just make things SOAP. And when you create haphazard SOAP, you create Distributed Little Balls of Mud. SOA got a bad name because everyone was employing SOA worst practices.”<\/p>\n Mooter shared some of his latest opinions in a Forrester blog post entitled, “The Death of Microservices?”<\/a> In an interview with us, he noted, “I think you’re seeing, with some of the reaction to this Amazon blog, when you do microservices worst practices, and you blame microservices rather than your poor architectural decisions, then everyone says microservices stink… Put aside microservices: Any buzzword tech trend cannot compensate for poor architectural decisions.”<\/p>\n The sheer fact that “Big Ball” is a nebulous, plastic metaphor has enabled almost any methodology or architecture that fell out of favor over the past quarter-century, to become associated with it. When microservices makes inroads with organizations, it’s the monolith that gets to wear the crown of thorns. More recently, with some clever phraseology, microservices has carried the moniker of shame<\/a>.<\/p>\n “Our industry swings like a pendulum between innovation, experimentation, and growth (sometimes just called ‘peacetime’) and belt-tightening and pushing for efficiency (‘wartime’),” stated Laura Tacho<\/a>, long-time friend of The New Stack, and a professional engineering coach. “Of course, most companies have both scenarios going on in different pockets, but it’s obvious that we’re in a period of belt-tightening now. This is when some of those choices — for example, breaking things into microservices — can no longer be justified against the efficiency losses.”<\/p>\n Berkholz has been observing the same trend: “There’s been this push back-and-forth within the industry — some sort of a pendulum happening, from monolith to microservices and back again. Years ago, it was SOA and back again.”<\/p>\n Defenders of microservices against the mud-throwing that happens when the pendulum swings back, say their architecture won’t be right for every case, or even every organization. That’s a problem. Whenever a market is perceived as being served by two or more equivalent, competing solutions, that market may correctly be portrayed as fragmented. Which is exactly the kind of market enterprises typically avoid participating in.<\/p>\n “Fragmentation implies that the problem hasn’t been well-solved for everybody yet,” Berkholz told us, “when there’s a lot of different solutions, and nobody’s consolidated on a single one that makes sense most of the time. That is something that companies watch. Is this a fragmented ecosystem, where it’s hard to make choices? Or is this an ecosystem where there’s a clear and obvious master?”<\/p>\n From time to time, Lori MacVittie told us, F5 Networks surveys its clients, asking them for the relative percentages of their applications portfolios they would describe as monoliths, microservices, mobile apps and middleware-infused client\/server apps. “Most organizations were operating at some percentage of each of those,” she told us. When the question was adjusted, asking only whether their apps were “traditional” or “modern,” the split usually has been 60\/40, respectively.<\/p>\n “They’re doing both,” she said. “And within those, they’re doing different styles. Is that a mess? I don’t think so. They had specific uses for them.”<\/p>\n “I kind of feel like microservice-vs.-monolith isn’t a great argument,” stated Microsoft’s Brendan Burns. “It’s like arguing about vectors vs. linked lists or garbage collection vs. memory management. These designs are all tools — what’s important is to understand the value that you get from each, and when you can take advantage of that value. If you insist on microservicing everything, you’re definitely going to microservice some monoliths that probably you should have just left alone. But if you say, ‘We don’t do microservices,’ you’re probably leaving some agility, reliability and efficiency on the table.”<\/p>\nThat Dear Perfection<\/h2>\n
Breakdown<\/h2>\n
Buildup<\/h2>\n