关于 Apache Pulsar

Apache Pulsar 是 Apache 软件基金会顶级项目,是下一代云原生分布式消息流平台,集消息、存储、轻量化函数式计算为一体,采用计算与存储分离架构设计,支持多租户、持久化存储、多机房跨区域数据复制,具有强一致性、高吞吐、低延时及高可扩展性等流数据存储特性。
GitHub 地址:http://github.com/apache/pulsar/

导语

各位小伙伴们,2021 年 2 月 Pulsar 社区月报来啦!新年伊始,Pulsar 有哪些新年新气象呢?和我们一起开看看吧。

感谢以下各位社区成员本月对 Pulsar 项目的支持,让 Apache Pulsar 继续发光发热!(排名不分先后,看看你有没有上榜 ????????):

codelipenghui、hnail、merlimat、massakam、eolivelli、315157973、addisonj、 freeznet 、aloyszhang、RobertIndie、rdhabalia、jerrypeng、MarvinCai、hangc0276、sijie、kumar-singh、 congbobo184、Renkai、jerrypeng、yang、k2la、 xxxxpenny、Jennifer88huang 、Huanli-Meng、jbellis、dragonls、JipeiWang、 potiuk、WJL3333、sbourkeostk、gaoran10、wineway 、wangjialing218、odmark、sijia-w 、Anonymitaet、limingnihao 、aahmed-se、BewareMyPower、xche、 fantapsody、jdbeck 、danielorf、 tuteng、pointearth、murong00、sunxiaoguang

产品进展

SQL: [PIP-71] 在 Pulsar SQL 中将 SchemaHandle 迁移到 Presto 解码器。 
https://github.com/apache/pulsar/pull/8422 

Broker: [PIP-45] 使用协调服务实现分布式 ID 生成器。 
https://github.com/apache/pulsar/pull/9274 

Metadata: [PIP-45] 将会话事件添加到元数据存储中。 
https://github.com/apache/pulsar/pull/9273 

重要进展

Admin

Admin: 支持获取应用到 topic 的 MaxConsumers 策略。 
https://github.com/apache/pulsar/pull/9296 

Admin: 支持 V1 topic 的 schema REST API。 
https://github.com/apache/pulsar/pull/9218 

Admin: 确保 Statestore 上的管理操作是非阻塞的。 
https://github.com/apache/pulsar/pull/9348 

Pulsar Admin:在 topic stats-internal 中公开 schema ledger。 
https://github.com/apache/pulsar/pull/9284 

Build

Build: 在 Windows 中使用换行符g 构建 Pulsar。 
https://github.com/apache/pulsar/pull/9536 

Build: 在 javac 中添加 -parameters 命令行参数。 
https://github.com/apache/pulsar/pull/9624 

Build: 更新 Pulsar 和 Dashboard 的 Docker 文件以创建和使用 Pulsar 用户(非 root 用户)。 
https://github.com/apache/pulsar/pull/8796 

Build: 将 Java 11 添加到 Pulsar 构建镜像并升级依赖项。 
https://github.com/apache/pulsar/pull/9609 

Build: 支持使用 JDK11 和 -Dmaven.compiler.release = 8 构建 Pulsar。 
https://github.com/apache/pulsar/pull/9580 

Broker

Broker: 未初始化调度器时,跳过清除延时消息。 
https://github.com/apache/pulsar/pull/9378 

Broker: Topic 资源使用元数据存储 API。 
https://github.com/apache/pulsar/pull/9485 

Broker: 支持启用订阅类型。 
https://github.com/apache/pulsar/pull/9401  

Broker: 为 topicstats 添加订阅 backlog 大小的信息。 
https://github.com/apache/pulsar/pull/9302

Broker: 授权服务使用元数据存储 API。 
https://github.com/apache/pulsar/pull/9586 

Broker: 当 Pulsar broker 禁用复制订阅时,忽略 (.replicateSubscriptionStatetrue) 设置。 
https://github.com/apache/pulsar/pull/9523 

Broker: namespace 资源使用元数据存储 API。 
https://github.com/apache/pulsar/pull/9351

Broker: 确保从 MessageId.latest 开始订阅消息。 
https://github.com/apache/pulsar/pull/9444 

Broker: 添加流 dispatcher。 
https://github.com/apache/pulsar/pull/9056 

Broker: 当 broker 从 ZooKeeper client 更新 ZooKeeper 元数据时,处理元数据存储中的错误版本。 
https://github.com/apache/pulsar/pull/9412 

Broker: 添加 token 待过期/已过期的警报。 
https://github.com/apache/pulsar/pull/9321 

Broker: 添加新的方法,支持异步读取配置最大字节的 entry 时,即便 entry 中没有字节,也立即返回结果。 
https://github.com/apache/pulsar/pull/9532 

Broker: 禁止在 authenticateToken 中缺少签名的情况下解析 token。 
https://github.com/apache/pulsar/pull/9172 

Functions

Function: Kubernetes runtime function 支持创建符合 RFC1123 标准的标签。 
https://github.com/apache/pulsar/pull/9556 

Function: 设置 Pulsar Functions 支持的最大资源量。 
https://github.com/apache/pulsar/pull/9584 

Function: 为 Kubernetes runtime 运行时添加 downloadDirectory 支持。 
https://github.com/apache/pulsar/pull/9377 

Functions: 通过 Function 上下文公开 pulsar-admin 客户端。 
https://github.com/apache/pulsar/pull/9246 

Functions: 支持在 Pulsar Functions 的 ThreadRuntime 中为 Pulsar client 设置内存限制。 
https://github.com/apache/pulsar/pull/9320 

Functions: 通过消除冗余的 NAR 解包和 checksum 计算,优化内置的 source/sink 启动进程。 
https://github.com/apache/pulsar/pull/9413 

Functions: 使用默认选项增强 Kubernetes manifest 定制器。 
https://github.com/apache/pulsar/pull/9445 

Function/IO: 优化内置的 source/sink 启动。 
https://github.com/apache/pulsar/pull/9500 

Function: Function Worker 支持使用互斥模式的 producer 将消息写入内部 topic。 
https://github.com/apache/pulsar/pull/9275

Flaky Test

Flaky Test: 持续将 alwaysRun = true 传递给所有 TestNG @After* 注解。 
https://github.com/apache/pulsar/pull/9635 

Flaky Test: 为 RetryTopicTest#testRetryTopicWithMultiTopic添加确认超时设置。 
https://github.com/apache/pulsar/pull/9334 

Dead Letter Queue: 避免在 Key_Shared 订阅模式下启用 DLQ。 
https://github.com/apache/pulsar/pull/9163 

Test: 添加对非持久化可复制的 topic 创建 producer 的测试。 
https://github.com/apache/pulsar/pull/9322 

Managed Ledger: 支持配置 ManagedLedger 存储。 
https://github.com/apache/pulsar/pull/9397 

C++

C++: pulsar-perf CLI 工具支持配置连接池。 
https://github.com/apache/pulsar/pull/8913 

C++: 修复了复制到指定集群的消息损坏的问题。 
https://github.com/apache/pulsar/pull/9372 

C++: 删除 Boost::System 运行时依赖项。 
https://github.com/apache/pulsar/pull/9498 

C++: 支持在构建过程中禁用静态或动态库。 
https://github.com/apache/pulsar/pull/9570 

C++: 在不同的 CMake 版本上说明不同的变量名称。 
https://github.com/apache/pulsar/pull/9559 

C++: 添加与 schema 相关消息的详细日志。
https://github.com/apache/pulsar/pull/9544

C++: 删除 boost :: regex 的使用。 
https://github.com/apache/pulsar/pull/9533 

CLI

CLI: pulsar-client CLI 工具支持端到端加密。 
https://github.com/apache/pulsar/pull/9615 

CLI: 为 topic 压缩工具添加用户友好的注释。 
https://github.com/apache/pulsar/pull/9563 

CLI: 将 delete-cluster-metadata 命令中的集群参数 -cluster 更新为 --cluster
https://github.com/apache/pulsar/pull/9487 

CLI: 为 pulsar-admin CLI 工具添加用于列出 bookie 的命令。 
https://github.com/apache/pulsar/pull/9283 

CLI:按消息位置使消息过期。 
https://github.com/apache/pulsar/pull/9514 

Client

Client: 添加 CryptoKeyReader 的默认实现。 
https://github.com/apache/pulsar/pull/9379 

Client: 支持在 Key_Shared 订阅模式下处理具有多个 listener 线程的消息。 
https://github.com/apache/pulsar/pull/9329 

Client: 将 function 依赖项作为 Pulsar Python 客户端的可选组件。 
https://github.com/apache/pulsar/pull/9719 

Client: 代码重构力求复用逻辑。 
https://github.com/apache/pulsar/pull/9670 

Client: 在 Reader / Consumer API 中暴露 ReachedEndOfTopic。 
https://github.com/apache/pulsar/pull/9381 

Topic Policy

Topic Policy: 支持获取应用到 topic 的 retention 策略。 
https://github.com/apache/pulsar/pull/9362 

Topic Policy: 支持获取应用到 topic 的 offloader 策略。 
https://github.com/apache/pulsar/pull/9505 

Metrics

Metrics:向 Prometheus 添加 producer 指标。 
https://github.com/apache/pulsar/pull/9541 

Metrics: 减少创建指标的 CPU 消耗。 
https://github.com/apache/pulsar/pull/9735 

Metrics: 添加用于限制 producer 的指标。 
https://github.com/apache/pulsar/pull/9649 

其他

CI: 用 paths-ignore 行为代替 diff-only 行为。 
https://github.com/apache/pulsar/pull/9527 

TestClient: 为 pulsar-perf CLI 工具添加 --batch-index-ack 命令行参数。 
https://github.com/apache/pulsar/pull/9521 

Pulsar Common: 添加用户友好的错误提示,以在出现 namespace 解析错误时告诉用户所需的namespace 格式。 
https://github.com/apache/pulsar/pull/9510 

Transaction: 在 pulsar-transaction 模块中启用 SpotBug。 
https://github.com/apache/pulsar/pull/9620 

Python: 支持端到端加密。 
https://github.com/apache/pulsar/pull/9588 

Pulsar-IO: 将 auto.offset.reset 的选项添加到 Kafka Source 连接器。 
https://github.com/apache/pulsar/pull/9482 

支持获取应用到 topic 的 DeduplicationStatus 策略。 
https://github.com/apache/pulsar/pull/9339 

Compression: 修复 AirliftUtils 中的 ByteBuffer 分配错误。 
https://github.com/apache/pulsar/pull/9667 

BookKeeper: 支持配置 BookKeeper 的 Opportunistic Striping 特性。 
https://github.com/apache/pulsar/pull/9232 

Tiered Storage: 允许刷新 AWS 证书。 
https://github.com/apache/pulsar/pull/9387 

Bug 修复

Flaky Test

Flaky Test: 修复 pulsar-test-latest-version docker 镜像中 Bookie 的 dbStorage_ * CacheMaxSizeMb 设置。 
https://github.com/apache/pulsar/pull/9623 

FlakyTest: 通过增加 OverloadedThreshold 来修复 testBrokerSelectionForAntiAffinityGroup。 
https://github.com/apache/pulsar/pull/9393 

Flaky Test: 修复 BrokerServiceLookupTest.testModularLoadManagerSplitBundleFlaky test。 
https://github.com/apache/pulsar/pull/9577 

Flaky Test: 修复 TransactionMetaStoreAssignAndFailover的 Flaky test。 
https://github.com/apache/pulsar/pull/9506 

Flaky Test: 修复 LeaderElection Flaky test。 
https://github.com/apache/pulsar/pull/9443 

FlakyTest: 修复测试清除方法中的 NPE 问题。 
https://github.com/apache/pulsar/pull/9442 

FlakyTest: 修复失败的 LoadReportNetworkLimit 测试。 
https://github.com/apache/pulsar/pull/9581 

Flaky Test: 修复 ConsumedLedgersTrimTest.testConsumedLedgersTrimNoSubscriptions的 Flaky test。 
https://github.com/apache/pulsar/pull/9420 

Flaky Test: 修复 V1_ProducerConsumerTest.testConcurrentConsumerReceive的 Flaky test。 
https://github.com/apache/pulsar/pull/9435 

Flaky Test: 修复 MessagePublishBufferThrottleTest 的 Flaky test。 
https://github.com/apache/pulsar/pull/9376 

Flaky Test: 修复 TopicReaderTest.testMultiReaderIsAbleToSeekWithTimeOnMiddleOfTopic的 Flaky test。 
https://github.com/apache/pulsar/pull/9375 

Admin

Admin: 修复即使存在检查错误,backlogQuota 也会更新成功的问题。 
https://github.com/apache/pulsar/pull/9382 

Admin: 在消息为空时,修复通过 ID 获取消息时发生的 NPE。 
https://github.com/apache/pulsar/pull/9537 

Admin CLI: 当未执行请求过期消息的操作时,通知用户。 
https://github.com/apache/pulsar/pull/9561

Functions

Go Functions: 修复 metrics server handler 错误。 
https://github.com/apache/pulsar/pull/9394 

Function: 修复 broker-function 服务启动时可能出现的死锁。 
https://github.com/apache/pulsar/pull/9499 

Functions: 根据组件类型调用相应的重启。 
https://github.com/apache/pulsar/pull/9519 

Function: 解决在某些情况下读取指标始终卡住的问题。 
https://github.com/apache/pulsar/pull/9538 

Functions: 合理关闭 InputStreams。 
https://github.com/apache/pulsar/pull/9568

Functions: 修复了无法使用 m-TLS 创建 functions 的问题。 
https://github.com/apache/pulsar/pull/9260 

Functions: 修复 broker.conf 配置的 narExtractionDirectory 未生效问题。 
https://github.com/apache/pulsar/pull/9319 

Broker

Broker: 修复选主时的 NPE 和缓存失效的问题。 
https://github.com/apache/pulsar/pull/9460 

Broker: 修复 partition 数与预期不符的错误。 
https://github.com/apache/pulsar/pull/9446 

Broker: 修复用于 partitioned topic 的 msgDelayed 指标数据。 
https://github.com/apache/pulsar/pull/9529 

Broker: 修复为现有 topic 创建 partition 时抛出的 RestException 异常。 
https://github.com/apache/pulsar/pull/9342 

Broker: 从 Pulsar-service 中删除 global-zk 的引用。 
https://github.com/apache/pulsar/pull/9648 

Broker: 修复当  wait-for-exclusive producer 处理不当时的超时操作。
https://github.com/apache/pulsar/pull/9600 

Broker: 修复启用调试日志后在 PersistentStickyKeyDispatcherMultipleConsumers中发生的 NPE。 
https://github.com/apache/pulsar/pull/9587 

Broker: 仅在错误级别记录 auth _errors_。 
https://github.com/apache/pulsar/pull/9325 

Broker: 修复如果未找到持久订阅,则永远不会触发自动 topic 压缩的问题。 
https://github.com/apache/pulsar/pull/8204

Broker: 修复 BrokerService topic 缓存中的竞争条件。 
https://github.com/apache/pulsar/pull/9565 

Broker: 修复获取 PB 消息的可选字段时引发的异常。 
https://github.com/apache/pulsar/pull/9468 

Broker: 修复 ZooKeeper 子节点 watch 通知的问题。 
https://github.com/apache/pulsar/pull/9473 

Client

Client: 在 multi-topic 订阅失败时清理 consumer。 
https://github.com/apache/pulsar/pull/9419 

Client: 当消息脱离队列时删除消息数据大小。 
https://github.com/apache/pulsar/pull/8566 

Client: 确保在延迟模式准备期间应用压缩,并启用 enableBatching。 
https://github.com/apache/pulsar/pull/9396 

Client: 使 DLQ 流程异步。 
https://github.com/apache/pulsar/pull/9552 

Client: 修复互斥访问模式 producer 中的 bug。 
https://github.com/apache/pulsar/pull/9554 

Client: 修复 acknowledgeAsync 的批消息版本在单个消息完成之前返回 completed future 的问题。 
https://github.com/apache/pulsar/pull/9383 

Client: 添加 BouncyCastleProvider 作为安全提供程序以防止 NPE。 
https://github.com/apache/pulsar/pull/9601 

Client: 修复 GenericJsonRecord 的写入/编码。 
https://github.com/apache/pulsar/pull/9608 

Client: 修复空 topic 的 hasMessageAvailable() 返回错误结果的问题。 
https://github.com/apache/pulsar/pull/9652 

TestClient: 当配置 threadNum> = ledgerNum 时,修复 ManagedLedgerWriter 中的逻辑。 
https://github.com/apache/pulsar/pull/9497 

C++

C++: 在 commands.newproducer() 中添加 encrypted 选项。 
https://github.com/apache/pulsar/pull/9542 

C++: 从 MultiTopicsConsumerImpl 中删除 namespace 检查。 
https://github.com/apache/pulsar/pull/9520 

C++: 修复 SinglePartitionMessageRouter 始终选择相同 partition 的问题。 
https://github.com/apache/pulsar/pull/9702

Build

Build: 在 CmakeLists.txt 中添加缺失的 Python 3.9 路径。 
https://github.com/apache/pulsar/pull/9574 

Build: 从已停止使用的 pulsar-functions-worker-shaded 中删除 pom.xml 文件。 
https://github.com/apache/pulsar/pull/9637 

Build: 缩小 pulsar-test-latest-version docker 镜像大小。 
https://github.com/apache/pulsar/pull/9627 

Build: 修复 C++ 测试中 TLS 证书过期的问题。 
https://github.com/apache/pulsar/pull/9607 

Build: 禁用在公共仓库中查找快照版本。 
https://github.com/apache/pulsar/pull/9336 

Build:避免将 BookKeeper-common 引入 pulsar-common。 
https://github.com/apache/pulsar/pull/9551 

Schema Registry

Schema Registry: 修复 BookkeeperSchemaStorage NPE 的问题。 
https://github.com/apache/pulsar/pull/9264 

Schema Registry: 修复由于 schema ledger 损坏导致 topic 加载失败的问题。 
https://github.com/apache/pulsar/pull/9212 

Release

Release: 修复将单个删除标记为 dirty 的问题。 
https://github.com/apache/pulsar/pull/9732 

Release: 为增量 volatile messagesConsumedCounter 使用 atomic field updater。 
https://github.com/apache/pulsar/pull/9656

Release: 保持 max-subscriptions 命令风格一致。 
https://github.com/apache/pulsar/pull/9750 

Python

Python: 初始化 Python 3.9 客户端 wheel 构建。 
https://github.com/apache/pulsar/pull/9389 

Python: 修复 schema 中无法运行的嵌套的 Map 或 Array。 
https://github.com/apache/pulsar/pull/9548 

Python: 匹配 BookKeeper 的 grpcio 要求。 
https://github.com/apache/pulsar/pull/9569 

CI

CI: 用于 WebSite 和 Mac OS 的 CI Build 写入过多的日志(由于缺少 -B 选项)。
https://github.com/apache/pulsar/pull/9539 

CI: 修复重复的取消工作流程运行。 
https://github.com/apache/pulsar/pull/9503 

其他

Docker: 修复 docker-compose 的 README 文件中的错误 URL。 
https://github.com/apache/pulsar/pull/9573 

CLI: 避免重复的选项。 
https://github.com/apache/pulsar/pull/9469 

Managed Ledger: 修复批处理索引确认持久性问题。 
https://github.com/apache/pulsar/pull/9504

Transaction: 修复 deleteTransactionMarker 内存泄漏。 
https://github.com/apache/pulsar/pull/9751 

Test: 添加调试功能,使集成测试容器在测试完成后仍在运行。 
https://github.com/apache/pulsar/pull/9626 

Bookie: 如果未定义 BOOKIE_GC,则退回到  PULSAR_GC。 
https://github.com/apache/pulsar/pull/9621 

周边生态

Pulsar SQL: 修复 Pulsar SQL 查询字节 schema 数据错误的问题。 
https://github.com/apache/pulsar/pull/9631 

Pulsar SQL: 修复 Presto 服务器中重复 pfn_input_topic 密钥的问题。 
https://github.com/apache/pulsar/pull/9686 

Pulsar SQL: 添加用于 Pulsar SQL 的测试以支持 keyValue 模式。 
https://github.com/apache/pulsar/pull/9388 

Metrics: 添加指标,以确保 cursor 确认状态的持久性。
https://github.com/apache/pulsar/pull/9618

技术干货

•TGIP 019: Latest Updates On Apache Pulsar[1]•How to choose Pulsar vs Kafka?[2]•Migrate to Serverless with Pulsar Functions[3]•Pulsar Office Hour: Monthly live stream about Pulsar best practices, use cases, and more.[4]•StreamNative's 2020 Year in Review[5]Iterable:从 Kafka 到 Pulsar,我们选对了!Apache Pulsar 延迟消息投递解析Apache Pulsar 对现代数据堆栈至关重要的四个原因深度剖析 Pulsar FunctionsPulsar 存储空间不释放的问题分析与解决方法选择 Apache Pulsar 而非 Kafka 的 10 个理由


以上就是 2021 年 2 月份的脉动之旅。Apache Pulsar 正在快速成长,感谢来自大家的支持!

Apache Pulsar 社区鼓励大家积极参与开源社区,无论是文档、代码、翻译,还是技术博客,都欢迎大家积极参与,早日成为 Pulsar contributor,一起加油鸭。

如果你对 Pulsar Contribute 的流程不太熟练,也可以参考我们这篇小教程,让你熟悉如何通过 GitHub 对 Pulsar 进行贡献:新手向|非技术人员如何参与 Pulsar 项目进行贡献

 

引用链接

[1] TGIP 019: Latest Updates On Apache Pulsar: https://www.youtube.com/watch?v=EkRULmkaWGY
[2] How to choose Pulsar vs Kafka?: https://developpaper.com/how-to-choose-pulsar-vs-kafka/
[3] Migrate to Serverless with Pulsar Functions: https://streamnative.io/en/blog/tech/2021-02-10-migrate-to-serverless-with-pulsar-functions
[4] Pulsar Office Hour: Monthly live stream about Pulsar best practices, use cases, and more.: https://www.youtube.com/watch?v=Xh94uDE1pg4
[5] StreamNative's 2020 Year in Review: https://streamnative.io/en/blog/community/2021-02-16-streamnative-2020-year-in-review