本次更新内容

支持 RBAC 角色权限控制，对所有引擎、所有语言API均有效；文档
自动清理旧的 compaction 数据，支持分区级生命周期（TTL）；文档
升级 Flink 版本到 1.17，并支持批模式下行级别更新和删除；
优化整库同步 Flink 作业，吞吐提升 80%： #307 ；
支持 Presto 读取；文档
支持原生 Python 读取，提供 PyTorch、HuggingFace 的集成。文档

更新日志

update maven versions
Support cdc column value change in compaction (#336)
[Docs ] Refine flink sql and python docs (#337)
Support view、batch update、batch delete in flink (#332)
[Bug]filter should not pushdown before merge on read (#310)
[Bug] turn off native meta query and set prefetch_size=1 temporarily (#333)
add workspace and rbac docs (#331)
add doc machine-learning-support.md (#330)
[Docs] Add presto connector deployment docs (#329)
[Docs] Usage on auto table clean (#326)
update docs and readme for release 2.4
update docs and readme for release 2.4 (#328)
[Python] Examples using Python API for AI model training (#327)
[Python][Dataset] Update Python dataset api for LakeSoul (#325)
list namespace should return empty array (#323)
support query metadata with null string (#324)
[RBAC] Set hdfs dir owner (#321)
bump version to 2.4.0 (#319)
merge native-io and native-metadata modules (#318)
ignore exception when hadoop env missing (#317)
add scala in common to address build in idea intellij (#316)
Presto Connector Support (#314)
[Flink] cdc supplement data delay check mechanism and fix logicallyDro…
retry when native metadata client fails (#313)
add arg exclude_partition at get_arrow_schema_by_table_name (#312)
[NativeIO] add hdfs feature in lakesoul-io-c (#311)
[Flink] Optimize CDC sink serde with Fury (#307)
rollback flink cdc to 2.3.0 and supplement tables check in benchmark (#309)
[Python][Dataset] PyArrow and PyTorch dataset api for LakeSoul (#308)
[Python] C callback with data (#306)
[Python][Native-Metadata] Python interface of lakesoul metadata (#305)
clean old compaction data and redundant data (#304)
upgrade flink cdc connector to 2.4 (#303)
update docs (#298)
fix jackson-core package in flink (#297)
[Native-Metadata] Rust implementation of DAO layer (#294)
fix apache license (#293)
Add Built-in RBAC support (#292)
update arrow version (#290)
[Python][NativeIO] Add C interface definition (#291)
Upgrade Flink to 1.17 (#288)
[Flink] implement filter pushdown and fix partition pushdown in flink (#287)
[NativeIO] Upgrade datafusion to 27 (#282)
add 2.3.0 release blog (#281)
[Doc] Add Spark session catalog doc (#279)
[Project] add/change spdx header to all files (#278)
format table path with hdfs configuration (#277)
[Metadata] refractor metadata entity with protobuf code-gen (#276)
[RBAC] PG workspace implementation (#274)
partition rollback/cleanup add timeZoneId (#275)

完整的更新日志: https://gitee.com/meta-soul/LakeSoul/compare/v2.3.1...v2.4.0

最后提交信息为： update maven versions

v2.3.0

aac4dfa

2023-07-19 14:12

v2.3.0

更新日志

这是 LakeSoul 捐赠给 Linux Foundation AI & Data 后的第一个发布版本。该版本包含以下重要更新：

全面支持 Flink SQL/Table API. LakeSoul 支持 Flink 流、批读写。流式读写完整支持 Flink Changelog 语义，支持行级别流式增删改。参考文档
Flink CDC 整库同步重构，支持从消息中自动推断新表和 schema 变更。能够更简单的开发 CDC 入湖作业并支持消费任意数据库 CDC 流或消息队列流。
全局自动 Compaction 服务。参考文档：LakeSoul 全局自动压缩服务使用方法.

最后提交信息为： change version to 2.3.0

v2.2.0

c3b83c1

2023-04-20 17:36

v2.2.0

v2.2.0 发布日志

Native IO 在 Flink 和 Spark 上默认启用。Native IO 使用 arrow-rs 和 [Datafusion] (https://github.com/apache/arrow-datafusion) 实现，并在 arrow-rs object store 上做了专门的性能优化。在实际测试中比 parquet-mr+hadoop filesystem 快 3 倍以上。Native IO 可以支持 HDFS 和 S3 存储，以及与 S3 兼容的存储系统。Native IO 经过了详细的测试，能够支持 Flink、Spark 所有数据类型，并通过了 TPC-H 和 CHBenchmark 的正确性校验。
在 Spark 上支持了快照读和增量读功能。增量读功能可以支持 batch 模式和 micro batch streaming 模式。
默认的 Spark 版本更新到 3.3.

最后提交信息为： prepare 2.2.0 release

meta-soul / LakeSoul .gitee-modal { width: 500px !important; }

本次更新内容

更新日志

更新日志

搜索帮助

meta-soul / LakeSoul