Skip to content

v0.3.0

Pre-release
Pre-release

Choose a tag to compare

@hefanli hefanli released this 31 Jan 08:22
· 94 commits to main since this release
2efb1be

版本定位

DataMate 本版本聚焦数据治理能力增强多租户隔离:补齐数据血缘、数据采集能力,完善标注工作流与算子生态,统一模型配置与数据权限,提升平台稳定性和可运维性。

核心功能更新

模块 更新内容
数据归集 新增:API Reader 采集插件,支持将API接口返回的数据以CSV文件的形式归集到DataMate。
新增:通用关系型数据库采集模板(RDBMS Reader),增加对于postgres、opengauss、sqlserver等关系型数据库的归集支持。
数据处理 变更:更新数据处理任务算子展示方式
新增:重试次数展示和日志根据重试次数查看
数据管理 新增:数据血缘页,实现数据归集、数据管理、数据清洗过程中数据血缘的追溯
新增:上传时支持按文件夹上传;前端预览文本和图片文件内容;前端查看已标注文件的标签
修复:未入库文件元数据无法查看,无法预览内容
数据标注 优化:整合并完善辅助标注和人工标注的同步逻辑,实现标注工具标注数据和数据集数据的双向同步
优化:保留一级操作(同步、编辑),并将低频操作(删除任务、编辑任务数据集、导出标注结果)收敛至二级菜单。
修复:分页参数错误导致的前端展示问题
算子市场 新增:算子按功能分类
新增:算子文档与示例补充
优化:前端展示(Card、OperatorServiceMonitor、Requirement 等组件)
知识生成 优化:知识图谱改用 2D 力导向图、渲染逻辑调整
模型和配置 重构:模型配置统一、LLM 客户端工厂标准化
优化:会话与模型创建时的用户上下文追踪
部署模块 新增:Docker 镜像按分支打 tag
新增:MinerU 适配 310P、构建参数优化
新增:SECURITY.md 安全策略说明
修复:数据库日志目录权限、图片相似度重复重试误过滤
修复:编译阶段 three.js 依赖缺失
用户管理 新增:数据归集、数据集、清洗、合成、配比、评估、知识库、算子按创建者隔离,系统预置数据不隔离
新增:DataSetScope 数据权限、UserContext 传递

What's Changed

  • feat(data-management): add preview functionality for text and image items by @o0Shark0o in #259
  • fix: 添加编译阶段three.js依赖 by @Dallas98 in #261
  • fix: 增加three依赖 by @hhhhsc701 in #262
  • fix: 数据库修改日志目录权限/修复图片相似度重复重试任务时误过滤 by @hhhhsc701 in #264
  • Fix: Annotation template paginate can not show when template count > 12 by @q792602257 in #260
  • feat: refactor KnowledgeGraphView to use 2D force graph and improve rendering logic by @Dallas98 in #266
  • fix: 翻页参数问题 by @q792602257 in #265
  • feat(data-management): fix bulk upload issues and enhance UI & upload experience by @o0Shark0o in #269
  • feat(auto-annotation): sync tags and timestamps to datasets and optimize visibility by @o0Shark0o in #271
  • feat: 优化mineru构建部署参数,适配310P by @hhhhsc701 in #270
  • 算子前端展示更新 by @hhhhsc701 in #273
  • realize that data sets, data cleaning, and knowledge bases are isolated according to the creator, and operators are not isolated. by @hefanli in #268
  • feat: 增加deepwiki链接 by @hhhhsc701 in #278
  • feat: 清洗任务增加重试次数记录和日志展示 by @hhhhsc701 in #280
  • feat: 数据库改为单例模式 by @hhhhsc701 in #281
  • feat: 补充算子文档和示例 by @hhhhsc701 in #282
  • feat(annotation): add bidirectional sync and flexible export for annotation tasks by @o0Shark0o in #284
  • feat: enhance dataset file download functionality to zip all files and improve path validation by @Dallas98 in #285
  • add data lineage page and data quality page by @hefanli in #287
  • Enhance README with new fields and unit updates by @hhhhsc701 in #288
  • feat: update Docker image tagging to use branch names for better identification by @Dallas98 in #289
  • feat: rename data cleansing references to data processing for consistency by @Dallas98 in #292
  • Create SECURITY.md for security policy by @yafengzhang2025 in #293
  • feat(annotation): simplify task creation and streamline sync workflow by @o0Shark0o in #294
  • 增加数据库插入 by @hhhhsc701 in #295
  • 增加空值处理 by @hhhhsc701 in #298
  • feat(annotation): refine sync behavior and add annotation export option by @o0Shark0o in #299
  • fixed the problem that the metadata of files that have not yet been stored in the database cannot be previewed. by @hefanli in #300
  • add apireader collection plug-in by @hefanli in #301
  • feature: added universal relational database collection template by @hefanli in #302
  • fixed issues where files inside folders could not be previewed and deleted by @hefanli in #303
  • refactor(models): unify model configuration and standardize LLM client factory by @Dallas98 in #283
  • feat(operator-market): reorganize operator categories by functionality by @o0Shark0o in #304
  • feat: enhance user tracking in session management and model creation by using effective user context by @Dallas98 in #305
  • fix: adapt to structural changes of COT data by @hefanli in #306

Full Changelog: v0.2.0...v0.3.0