v0.3.0
Pre-release
Pre-release
版本定位
DataMate 本版本聚焦数据治理能力增强与多租户隔离:补齐数据血缘、数据采集能力,完善标注工作流与算子生态,统一模型配置与数据权限,提升平台稳定性和可运维性。
核心功能更新
| 模块 | 更新内容 |
|---|---|
| 数据归集 | • 新增:API Reader 采集插件,支持将API接口返回的数据以CSV文件的形式归集到DataMate。 • 新增:通用关系型数据库采集模板(RDBMS Reader),增加对于postgres、opengauss、sqlserver等关系型数据库的归集支持。 |
| 数据处理 | • 变更:更新数据处理任务算子展示方式 • 新增:重试次数展示和日志根据重试次数查看 |
| 数据管理 | • 新增:数据血缘页,实现数据归集、数据管理、数据清洗过程中数据血缘的追溯 • 新增:上传时支持按文件夹上传;前端预览文本和图片文件内容;前端查看已标注文件的标签 • 修复:未入库文件元数据无法查看,无法预览内容 |
| 数据标注 | • 优化:整合并完善辅助标注和人工标注的同步逻辑,实现标注工具标注数据和数据集数据的双向同步 • 优化:保留一级操作(同步、编辑),并将低频操作(删除任务、编辑任务数据集、导出标注结果)收敛至二级菜单。 • 修复:分页参数错误导致的前端展示问题 |
| 算子市场 | • 新增:算子按功能分类 • 新增:算子文档与示例补充 • 优化:前端展示(Card、OperatorServiceMonitor、Requirement 等组件) |
| 知识生成 | • 优化:知识图谱改用 2D 力导向图、渲染逻辑调整 |
| 模型和配置 | • 重构:模型配置统一、LLM 客户端工厂标准化 • 优化:会话与模型创建时的用户上下文追踪 |
| 部署模块 | • 新增:Docker 镜像按分支打 tag • 新增:MinerU 适配 310P、构建参数优化 • 新增:SECURITY.md 安全策略说明 • 修复:数据库日志目录权限、图片相似度重复重试误过滤 • 修复:编译阶段 three.js 依赖缺失 |
| 用户管理 | • 新增:数据归集、数据集、清洗、合成、配比、评估、知识库、算子按创建者隔离,系统预置数据不隔离 • 新增:DataSetScope 数据权限、UserContext 传递 |
What's Changed
- feat(data-management): add preview functionality for text and image items by @o0Shark0o in #259
- fix: 添加编译阶段three.js依赖 by @Dallas98 in #261
- fix: 增加three依赖 by @hhhhsc701 in #262
- fix: 数据库修改日志目录权限/修复图片相似度重复重试任务时误过滤 by @hhhhsc701 in #264
- Fix: Annotation template paginate can not show when template count > 12 by @q792602257 in #260
- feat: refactor KnowledgeGraphView to use 2D force graph and improve rendering logic by @Dallas98 in #266
- fix: 翻页参数问题 by @q792602257 in #265
- feat(data-management): fix bulk upload issues and enhance UI & upload experience by @o0Shark0o in #269
- feat(auto-annotation): sync tags and timestamps to datasets and optimize visibility by @o0Shark0o in #271
- feat: 优化mineru构建部署参数,适配310P by @hhhhsc701 in #270
- 算子前端展示更新 by @hhhhsc701 in #273
- realize that data sets, data cleaning, and knowledge bases are isolated according to the creator, and operators are not isolated. by @hefanli in #268
- feat: 增加deepwiki链接 by @hhhhsc701 in #278
- feat: 清洗任务增加重试次数记录和日志展示 by @hhhhsc701 in #280
- feat: 数据库改为单例模式 by @hhhhsc701 in #281
- feat: 补充算子文档和示例 by @hhhhsc701 in #282
- feat(annotation): add bidirectional sync and flexible export for annotation tasks by @o0Shark0o in #284
- feat: enhance dataset file download functionality to zip all files and improve path validation by @Dallas98 in #285
- add data lineage page and data quality page by @hefanli in #287
- Enhance README with new fields and unit updates by @hhhhsc701 in #288
- feat: update Docker image tagging to use branch names for better identification by @Dallas98 in #289
- feat: rename data cleansing references to data processing for consistency by @Dallas98 in #292
- Create SECURITY.md for security policy by @yafengzhang2025 in #293
- feat(annotation): simplify task creation and streamline sync workflow by @o0Shark0o in #294
- 增加数据库插入 by @hhhhsc701 in #295
- 增加空值处理 by @hhhhsc701 in #298
- feat(annotation): refine sync behavior and add annotation export option by @o0Shark0o in #299
- fixed the problem that the metadata of files that have not yet been stored in the database cannot be previewed. by @hefanli in #300
- add apireader collection plug-in by @hefanli in #301
- feature: added universal relational database collection template by @hefanli in #302
- fixed issues where files inside folders could not be previewed and deleted by @hefanli in #303
- refactor(models): unify model configuration and standardize LLM client factory by @Dallas98 in #283
- feat(operator-market): reorganize operator categories by functionality by @o0Shark0o in #304
- feat: enhance user tracking in session management and model creation by using effective user context by @Dallas98 in #305
- fix: adapt to structural changes of COT data by @hefanli in #306
Full Changelog: v0.2.0...v0.3.0