Contributed"> Publish-Subscribe: Introduction to Scalable Messaging - The New Stack
Modal Title
Microservices / Networking / Serverless

Publish-Subscribe: Introduction to Scalable Messaging

In the pub/sub messaging pattern, publishers do not send messages directly to all subscribers; instead, messages are sent via brokers. Publishers do not know who the subscribers are or to which (if any) topics they subscribe.
Jul 21st, 2020 8:32am by
Featued image for: Publish-Subscribe: Introduction to Scalable Messaging
Feature image via Pixabay.

The publish-subscribe (or pub/sub) messaging pattern is a design pattern that provides a framework for exchanging messages that allows for loose coupling and scaling between the sender of messages (publishers) and receivers (subscribers) on topics they subscribe to.

Messages are sent (pushed) from a publisher to subscribers as they become available. The host (publisher) publishes messages (events) to channels (topics). Subscribers can sign up for the topics they are interested in.

This is different from the standard request/response (pull) models in which publishers check if new data has become available. This makes the pub/sub method the most suitable framework for streaming data in real-time.

It also means that dynamic networks can be built at internet scale. However, building a messaging infrastructure at such a scale can be problematic.

This introduction to the pub/sub messaging pattern describes what it is, and why developers use it, and discusses the difficulties that must be overcome when building a messaging system at scale.

Caption: The Ably realtime platform uses the publish-subscribe pattern at internet scale for delivering messages in real-time.

What Is Pub/Sub? Loose Coupling and Scaling

In the pub/sub messaging pattern, publishers do not send messages directly to all subscribers; instead, messages are sent via brokers. Publishers do not know who the subscribers are or to which (if any) topics they subscribe. This means publisher and subscriber operations can operate independently of each other. This is known as loose coupling and removes service dependencies that would otherwise be there in traditional messaging patterns.

Pub/sub is different from the standard request/response models in which publishers (pull) to check if new data is available. This makes the pub/sub method central to effective streaming of data in real-time.

The pub/sub pattern allows extremely dynamic networks to be built at scale without overloading the publishing components or causing unnecessary costs. However, there are difficulties associated with scaling and different ways of getting around these difficulties that need consideration.

Typical uses of the pub/sub pattern include event messaging, instant messaging, and data streaming (such as live-streaming sporting events). Pub/sub is also used for workload balancing and with asynchronous workflows.

Communication infrastructure for a pub/sub system (Diagram adapted from msn).

A Background to Messaging Systems and Pub/Sub

A simple information system can follow a simple pattern: input — processing — output. At a reasonable scale, the system will need multiple input and output modules for handling concurrent requests. A problem then arises of routing messages from input modules to their respective output modules.

To address this routing problem, the input and output modules need an addressing mechanism. The processing module will process the messages and route them to the correct recipient based on an address.

At internet scale, the publish-subscribe pattern can handle tens of thousands of concurrent connections.

At internet scale, the system will handle thousands or even tens of thousands of concurrent connections. It needs to also be capable of handling high volume and global geographical spread of users.

At such a massive scale, the system needs to solve the following problems:

  • Because of the high volume and geographical spread, the load needs to be distributed between multiple processing modules.
  • Predefined addressing between the modules becomes a huge overhead.

In short, the problems come down to minimizing the shared knowledge of addresses. Pub/sub solves the problems by using a data pipe through which modules can post and retrieve their messages.

The modules do not need to maintain shared knowledge of the whereabouts of other modules. The input modules only accept user input, processing modules only process the data, and the output modules only display the output.

In pub/sub, there is a channel for posting messages and one for retrieving. It happens in steps like this:

  1. The input module will gather the user input and post the message in the preprocessing channel.
  2. The processing module will pick the messages from this channel, process it and post it to the post-processing channel.
  3. Lastly, the output module will collect the message from the post-processing channel and display it on the users’ screen.

The same pattern can be followed at any scale.

In pub/sub messaging pre- and post-processing of the messages is used to address routing problems at internet scale.

Why Developers Use Pub/Sub

A logistics company, in theory, would typically have a mix of customer data and generic data and a very variable customer load. The data channels between the customers, the drivers, and the delivery office may also be unreliable. It is important that subscribers to messages/information receive all of the messages customers are sending, but it is not necessary to know about the customers or how many there are.

It is also important that the company does not over-provision their service (which would be costly), or over-provision load balancing, which would be detrimental to the performance of the network (because: extra complexity).

It is important to remember that the pub/sub pattern is suited to conveying information whose relevance fades fast. (What is the score now? And now?) As information is frequently replaced, there is no pressing need to store it. Usually, it is enough to keep the most recent message, or enough information to recreate a view of recent events.

Developers use pub/sub to take advantage of edge computing and the network backbone:

  • Edge computing allows you to scale the system at the edge. This is where scaling is easier to implement and also where it is most cost-effective.
  • Using the network backbone and multiple points of presence means message delivery can be much faster and more reliable.

How Pub/Sub Is Adopted in the Real-World

Matthew O’Riordan
Matthew O’Riordan is the technical co-founder of Ably, a globally distributed data stream network that is protocol agnostic. He has been a programmer for over 20 years. He first started working on commercial internet projects in the mid-90s, when Internet Explorer 3 and Netscape were battling it out, and Java and Flash were the up and coming technologies to bring interactivity to browsers. Throughout his time as a developer, he has not only been contributing to complex technical aspects of various projects, but also the commercial, UX and design aspects. He has also built and sold two tech business along the way, the last one being Econsultancy. At Ably, as a developer himself, his focus is not just on the best technical solution, but more often on the experience developers have of with their APIs. Developer relations for Ably is necessarily at the heart of everything they do, given their customers are all developers.

Event messaging: Pub/sub is widely used in delivery logistics. As we shop online more frequently for a wider variety of goods, package delivery has become commonplace. Logistics companies need to use delivery resources more efficiently. To optimize delivery, dispatching systems need up-to-date information on where their drivers are. Pub/sub-event messaging helps logistics companies do this.

Dispatchers need to access drivers’ location information on demand, ideally continually. This data will allow them to better predict arrival times and improve routing solutions. Dispatching systems also send out information such as cancellations, traffic information, and new package pickups.

As the day goes on, this information becomes more critical as it gets harder to maintain delivery time windows and schedule adjustments must be made to maximize the number of on-time deliveries.

This is a lot of data, and not all of it is relevant at any given time. To get around this problem, devices need to be able to subscribe to updates that matter to them. With a pattern like pub/sub, all parties only subscribe to whatever is relevant to them:

  • Driver devices can subscribe to traffic and route information.
  • Dispatching and ERP systems can subscribe to the completed delivery updates.
  • Tracking and dispatching systems can get live position updates when they need them.

These systems enable customers to track deliveries in real-time. For example, reschedule any package in transit, and to alert drivers that there are pickups to be made en route, to allow for more effective routing, which reduces fuel costs and improves efficiency.

Other use-case examples include:

  • Instant messaging: Service that provides near-instantaneous interaction, for example, a notification that the person you’re conversing with is typing.
  • Data streaming: Applications can provide data instantly to clients for processing, saving or live preview. For example, providing the latest match scores in a tennis tournament and making sure they are available to a new website visitor the moment the page loads. See Ably case study in Further reading.
  • Workload balancing: Knowing the capacity and location of parts of a system allows for better utilization of effort. This includes, for example, allowing logistics dispatchers to use partly empty delivery vehicles for pickup and on-demand delivery.
  • Asynchronous workflows: As an example, think of factory machines and power, water, and other utility sensors can update central control systems live. Improving the efficiency of the supply chain allows for just-in-time manufacturing, and capacity control.

Pub/sub code examples

Here are two examples of pub/sub applications with code snippets.

Faye

Faye is an open source system used by Aha! Roadmap software and Shopify. It is based on pub/sub messaging. The following code sample shows how to start a server, create a client, and send messages:

Ably Realtime Chat App

Here is an example of how you might add pub/sub functionality to a chat app using one of Ably‘s Realtime SDKs.

When the app launches, the SDK initializes and subscribes to the topic that represents a public chat room.


Subsequently, when the user wants to send a chat message, the chat app publishes the message on the same topic.


The app unsubscribes from the channel when the user logs out or leaves the chat room.

What to Consider When Pub/Sub Is Deployed and Scaled

It is straightforward to implement a single-channel pub/sub messaging framework. But when you start to scale, the classic problems of distributed systems engineering emerge. When scaling to multiple channels and even greater complexity, the problems increase, and maintaining reliability becomes difficult.

The Problems of Building a Messaging System at Scale

Distributed messaging systems should ideally have three elements of reliability, speed, and ordering. However, it’s usually the case that you can only choose two of them. To create a system that allows all three, you have to start at the design level with a watertight mathematical model. It is just about impossible to add in the missing element of your three later.

These are the problems to deal with:

  • Ordering of messages. As you start distributing messages over a large network, problems arise with reliably reconstructing the order in which the messages are meant to be delivered. To be reliably fast, you have to send messages using multiple routes in parallel, but you also have to be able to re-order and maintain their original sequence.
  • Queuing and auto-persistence of messages. For fault-tolerant, reliable messaging you must build in auto-persistence otherwise reconstruction is impossible if a system goes down and there are no records. If you don’t queue messages you can’t reliably reconstruct an order, or handle fluctuations in bandwidth.
  • Send exactly once. To send a message once and for it only to be received once at its required destination is a classic problem. If you don’t know who is receiving the message, it has to go everywhere. Either you have to have logic in the network to stop it arriving twice, or in the application to stop it from being processed twice. Otherwise, you might trigger an event twice with unintended consequences. For example, while making an online payment a user is disconnected and quickly reconnects. If exactly-once semantics are not supported, the user can end up getting charged twice when they reconnect.
  • Distributed storage. Fault tolerance requires multiple points of redundancy, failover storage, storage in different physical locations, and auto-healing networks. True reliability requires different physical hardware along with multiple cloud instances. The trade-off with such redundancy is complexity against security and safety.
  • Load surge and slowdown. How to actively scale a very transient load dynamically, allowing quick scale-up and slower scale-down, to maintain a fair and available network for users.
  • Rate limitation: Fair workload balancing is complicated. When your system becomes complex you need to consider how to manage customer usage. You have to provision service capacity for different customers fairly, without imposing hard limits.

These are all problems of building a system at scale. Because you don’t necessarily know all the information you might need about your system at any given time, either the framework needs to be clever enough to handle it, or all the applications in your system need to be quite advanced.

Ably balances the above concerns through judicious use of TCP level. By allowing multiple paths, we gain reliability but not at the expense of speed, we can do fast pathing because we control the path we follow. Also, because of the way the network is set up we can maintain ordering, which is often lost in the trade-off with the speed of delivery.

This is baked in at the design stage because the problems that arise when building in a global framework are almost impossible to correct at a later stage.

SaaS or Self-Deploy?

You can either build a pub/sub messaging infrastructure yourself (self-deploy) or adopt a cloud native Software-as-a-Service (SaaS) infrastructure, such as Ably.

Solving the design considerations of building a globally scaling system is far from easy for reasons described in the previous section. Building your own messaging system requires budgeting for more design upfront.

If choosing to self-deploy, there are also considerations such as infrastructure setup, installing and framework configuration. Doing these yourself oversight of the building the features you want in your system, but is also time-consuming and expensive.

The advantages of “as-a-service” pub/sub infrastructure over self-deployment are:

  • Reduced development time. Pub/Sub isolates application development from the messaging infrastructure.
  • Managed infrastructure is preconfigured. System tuning, security and design considerations are costly and time-consuming.
  • Programming options. Managed services support popular programming languages and frameworks. On the other hand, message broker frameworks support only a few languages. Building and maintaining SDKs for your own message broker is a diversion of development effort and time.
  • Skills. Hiring distributed systems engineers is difficult. If putting together a systems engineering team becomes part of your core infrastructure, you then have to maintain their skill set.
  • Cost. Most SaaS business models offer controllable levels of expenditure. You pay according to your needs and usage. Although it might seem cheaper to self-deploy, this hides the amount of investment required to build, run, and maintain the software. Your cloud bills are not the only expense.

Publish-Subscribe at Ably

Ably is a realtime messaging platform built on our own proprietary publish-subscribe messaging infrastructure. We deliver billions of realtime messages every day to more than 50 million end-users across web, mobile, and IoT platforms. We power things like HubSpot’s live chat, in-play scores for millions of Australian Open fans, and realtime transit updates for three million Chicagoans.

Ably’s platform is mathematically modeled and architected around Four Pillars of Dependability: Performance, Availability, Integrity, and Reliability. Where other providers sacrifice integrity of data for low latencies, or vice versa, Ably guarantees message ordering and delivery without sacrificing latencies, fault tolerance, or service availability. This approach means we can provide a pub/sub messaging service guaranteed to operate within strict, dependable, predictable, and transparent boundaries.

Developers place their trust in Ably to build real-time capabilities in their apps. Our feature-rich platform includes multiprotocol pub/sub messaging, presence, push notifications, free streaming data sources from across industries like transportation and finance, and integrations that extend Ably into third-party clouds and systems like AWS Lambda and RabbitMQ.

Businesses use Ably to distribute their streaming data to other businesses. This allows them to offload the engineering and data delivery challenges of providing data streams that are performant, reliable, and available to consume with various protocols.

Further Reading

See the in-depth Ably article: Everything you need to know about publish-subscribe with further details on the following aspects of publish-subscribe pattern.

Group Created with Sketch.
TNS owner Insight Partners is an investor in: Ably, Real.
THE NEW STACK UPDATE A newsletter digest of the week’s most important stories & analyses.