As noted in the 2021 Apache Pulsar User Survey Report, Apache Pulsar adoption and community engagement skyrocketed over the past year.
Key trends driving Pulsar adoption include the move to containers and cloud strategies, the need to solve for unprecedented scale and management complexity, the pivot from a pure streaming workload to unified batch and streaming workloads, and the need to unlock new use cases.
Pulsar’s cloud-native capabilities, unified messaging and streaming, scalability and reliability, and super-set of built-in features that enable new use cases and streamline operations make it uniquely positioned to meet many of today’s emerging needs.
In this report, we look at the key takeaways from the 2021 Apache Pulsar User Report.
Below we take a look at each of these highlights in more detail.
The two most important takeaways from the Pulsar User Survey 2021 are:
While the increase in Pulsar adoption is significant, the increase in production deployments has seen the most meaningful growth (see graph above). The 2021 Survey Report reveals that 51% of respondents were using Pulsar in production, compared to 31% the year prior. The increase in production use cases demonstrates Pulsar's ability to deliver mission-critical applications in the real world.
Question: How many messages does your organization process with Pulsar every day? Response: 12% of the respondents process over one trillion messages per day.
Pulsar has also seen an increase in the number of large scale, enterprise deployments. 12% of respondents shared that their organization processes more than 1 trillion messages per day using Pulsar. Tencent, Splunk, Newland Digital Technology Co Ltd, Kingsoft Cloud, and Pactera are just a handful of the companies who are using Pulsar to process more than 1 trillion messages per day.
The increase in companies running Pulsar at a large scale illustrates its ability to meet the scalability, reliability, and flexibility needs of companies today. Notably, Pulsar is meeting the needs of companies seeking a unified messaging and streaming platform.
Question: What other message queues does your organization use in addition to Pulsar? Response: 68% of respondents use Kafka in addition to Pulsar. Question: If you use connectors, which connectors do you use or plan to use for Pulsar? Response: 34% of respondents said Kafka on Pulsar (KoP)
A major insight from the user survey is the number of Kafka users who are adopting Pulsar. 68% of respondents said that they use Kafka in addition to Pulsar. Given Kafka is an older and more widely adopted technology, we can infer that these are companies who were already using Kafka and then decided to adopt Pulsar (versus Pulsar users who are adopting Kafka).
The figure below from API7(1), demonstrates the increase in Pulsar project engagement. Perhaps even more interesting, it shows that the Apache Pulsar community has surpassed Apache Kafka in monthly active contributors.
The 2021 survey also shows that more than one third of respondents use, or are planning to use, Kafka on Pulsar (KoP). KoP, which was launched in 2020, enables Kafka users to migrate their existing Kafka applications and services to Pulsar without modifying code.
KoP reduces barriers to Pulsar adoption for Kafka users and its popularity reveals that Kafka users are increasingly looking to Pulsar to solve problems and to enable use cases they are not able to achieve with Kafka.
The high percentage of respondents (68%) using both Kafka and Pulsar may seem counterintuitive, as the technologies serve many of the same use cases. But, in fact, there are distinct differences in Pulsar and Kafka’s use cases and capabilities.
Kafka was built to support data pipelines and large scale data movement to centralized locations. Pulsar, by contrast, was created to serve both messaging and data streaming use cases that require handling more topics with complex topologies and sophisticated consumption models.
Pulsar’s built-in offering of multi-tenancy, geo-replication, and scalability enable new use cases and capabilities that Kafka cannot match. The top use cases are: (1) Message Queues, (2) Pub/Sub, (3) Data Pipelines, (4) Streaming Processing, (5) Microservices/Event Sourcing, (6) Data Integration, (7) Change Data Capture, and (8) Streaming ETL. This list demonstrates Pulsar’s ability to solve for a broader range of use cases.
Below we look at some Pulsar adoption stories from the past 12 months:
The survey report shows that once it is adopted, Pulsar adoption expands across organizations. Tencent and Iterable are just two examples of Pulsar adoption expanding across an organization. When asked, “Will your organization build more applications on Pulsar in 2021”? 66% said “Yes” and another 10% said “Under Consideration.” That means 76% of Pulsar adopters are considering or planning to expand their Pulsar adoptions.
The adoption of Pulsar is being driven by a larger industry move to the cloud and Kubernetes. As part of this move, organizations are looking for technologies that run in the cloud, scale well, and can leverage and run well on top of Kubernetes.
Technologies with single tenant systems, monolithic architectures, and that lack geo-replication and multi-cloud capabilities are not able to meet the needs of modern data applications. As a result, companies are increasingly looking to adopt cloud-native technologies, like Pulsar, to meet their business needs.
The move to Kubernetes is not a simple lift and shift. This transition requires new development models, new ways of working, and is causing companies to re-evaluate how existing technologies will be deployed and managed in the cloud. For example, technologies such as Kafka, that were designed before Cloud was commonplace can be difficult to map to the capabilities of cloud and Kubernetes. These factors are leading companies to best-of-breed cloud-native technologies, including Pulsar.
Companies today are looking for a complete streaming solution and Pulsar’s integration with Flink is significant because it creates another differentiator for the Pulsar community. From the 2020 Survey to the 2021 Survey, the number of Pulsar + Flink use cases almost doubled. As noted above, the adoption of Pulsar is often driven by companies seeking the ability to achieve new use cases and the Pulsar + Flink integration is an example of this.
Stream processors, such as Kafka Streams, are adept at relatively simple processing of streaming data and computing answers close to real-time, but they are not a good fit for processing large historical datasets or datasets that require many joins and complex analysis. Many organizations need to run both batch and streaming data processors in order to gain the insights they need for their business, but maintaining multiple systems is expensive and complex.
More recently, systems have been developed which can do both batch and stream processing. Apache Flink is one example. Currently, Flink is used for stream processing with both Kafka and Pulsar. However, Flink's batch capabilities are not particularly compatible with Kafka as Kafka is only able to deliver data in streams, making it too slow for most batch workloads.
Pulsar's tiered storage model provides the batch storage capabilities needed to support batch processing in Flink. With Flink + Pulsar, companies are able to query both historical and real-time data quickly and easily, unlocking a unique competitive advantage.
(1) “Monthly Active Contributors.” API7, 10 Jun, 2021, https://www.apiseven.com/en/contributor-graph?chart=contributorMonthlyActivity&repo=apache/pulsar,apache/kafka