7 Star 35 Fork 18

meta-soul / LakeSoul

2024-01-10 14:18
meta-soul
  1. Python Reader 支持 PyTorch、PyArrow、Pandas、Ray,支持分布式执行;
  2. 支持 Spark Gluten Vectorized Engine;
  3. Spark SQL 支持 Compaction、Rollback 等 Call Procedures;
  4. Flink CDC 整库同步支持 MySQL、PostgreSQL、PolarDB、Oracle;
  5. 支持流式、批式出湖至 MySQL、PostgreSQL、PolarDB、Apache Doris;
  6. 优化 NativeIO 性能.
最后提交信息为: update version to 2.5.0
2023-09-21 19:10
meta-soul

本次更新内容

  1. 支持 RBAC 角色权限控制,对所有引擎、所有语言API均有效;文档
  2. 自动清理旧的 compaction 数据,支持分区级生命周期(TTL);文档
  3. 升级 Flink 版本到 1.17,并支持批模式下行级别更新和删除;
  4. 优化整库同步 Flink 作业,吞吐提升 80%: #307 ;
  5. 支持 Presto 读取;文档
  6. 支持原生 Python 读取,提供 PyTorch、HuggingFace 的集成。文档

更新日志

  • update maven versions
  • Support cdc column value change in compaction (#336)
  • [Docs ] Refine flink sql and python docs (#337)
  • Support view、batch update、batch delete in flink (#332)
  • [Bug]filter should not pushdown before merge on read (#310)
  • [Bug] turn off native meta query and set prefetch_size=1 temporarily (#333)
  • add workspace and rbac docs (#331)
  • add doc machine-learning-support.md (#330)
  • [Docs] Add presto connector deployment docs (#329)
  • [Docs] Usage on auto table clean (#326)
  • update docs and readme for release 2.4
  • update docs and readme for release 2.4 (#328)
  • [Python] Examples using Python API for AI model training (#327)
  • [Python][Dataset] Update Python dataset api for LakeSoul (#325)
  • list namespace should return empty array (#323)
  • support query metadata with null string (#324)
  • [RBAC] Set hdfs dir owner (#321)
  • bump version to 2.4.0 (#319)
  • merge native-io and native-metadata modules (#318)
  • ignore exception when hadoop env missing (#317)
  • add scala in common to address build in idea intellij (#316)
  • Presto Connector Support (#314)
  • [Flink] cdc supplement data delay check mechanism and fix logicallyDro…
  • retry when native metadata client fails (#313)
  • add arg exclude_partition at get_arrow_schema_by_table_name (#312)
  • [NativeIO] add hdfs feature in lakesoul-io-c (#311)
  • [Flink] Optimize CDC sink serde with Fury (#307)
  • rollback flink cdc to 2.3.0 and supplement tables check in benchmark (#309)
  • [Python][Dataset] PyArrow and PyTorch dataset api for LakeSoul (#308)
  • [Python] C callback with data (#306)
  • [Python][Native-Metadata] Python interface of lakesoul metadata (#305)
  • clean old compaction data and redundant data (#304)
  • upgrade flink cdc connector to 2.4 (#303)
  • update docs (#298)
  • fix jackson-core package in flink (#297)
  • [Native-Metadata] Rust implementation of DAO layer (#294)
  • fix apache license (#293)
  • Add Built-in RBAC support (#292)
  • update arrow version (#290)
  • [Python][NativeIO] Add C interface definition (#291)
  • Upgrade Flink to 1.17 (#288)
  • [Flink] implement filter pushdown and fix partition pushdown in flink (#287)
  • [NativeIO] Upgrade datafusion to 27 (#282)
  • add 2.3.0 release blog (#281)
  • [Doc] Add Spark session catalog doc (#279)
  • [Project] add/change spdx header to all files (#278)
  • format table path with hdfs configuration (#277)
  • [Metadata] refractor metadata entity with protobuf code-gen (#276)
  • [RBAC] PG workspace implementation (#274)
  • partition rollback/cleanup add timeZoneId (#275)

完整的更新日志: https://gitee.com/meta-soul/LakeSoul/compare/v2.3.1...v2.4.0

最后提交信息为: update maven versions
2023-07-19 14:12
meta-soul

更新日志

这是 LakeSoul 捐赠给 Linux Foundation AI & Data 后的第一个发布版本。该版本包含以下重要更新:

  1. 全面支持 Flink SQL/Table API. LakeSoul 支持 Flink 流、批读写。流式读写完整支持 Flink Changelog 语义,支持行级别流式增删改。参考文档
  2. Flink CDC 整库同步重构,支持从消息中自动推断新表和 schema 变更。能够更简单的开发 CDC 入湖作业并支持消费任意数据库 CDC 流或消息队列流。
  3. 全局自动 Compaction 服务。参考文档:LakeSoul 全局自动压缩服务使用方法.
最后提交信息为: change version to 2.3.0
2023-04-20 17:36
meta-soul

v2.2.0 发布日志

Native IO 在 Flink 和 Spark 上默认启用。Native IO 使用 arrow-rs 和 [Datafusion] (https://github.com/apache/arrow-datafusion) 实现,并在 arrow-rs object store 上做了专门的性能优化。在实际测试中比 parquet-mr+hadoop filesystem 快 3 倍以上。Native IO 可以支持 HDFS 和 S3 存储,以及与 S3 兼容的存储系统。Native IO 经过了详细的测试,能够支持 Flink、Spark 所有数据类型,并通过了 TPC-H 和 CHBenchmark 的正确性校验。
在 Spark 上支持了快照读和增量读功能。增量读功能可以支持 batch 模式和 micro batch streaming 模式。
默认的 Spark 版本更新到 3.3.

最后提交信息为: prepare 2.2.0 release
Scala
1
https://gitee.com/meta-soul/LakeSoul.git
git@gitee.com:meta-soul/LakeSoul.git
meta-soul
LakeSoul
LakeSoul

搜索帮助

53164aa7 5694891 3bd8fe86 5694891