diff --git a/230. Cloud/Azure/11. Data Tools/Azure Data Factory/README.md b/230. Cloud/Azure/11. Data Tools/Azure Data Factory/README.md
index 9257302e..023653de 100644
--- a/230. Cloud/Azure/11. Data Tools/Azure Data Factory/README.md
+++ b/230. Cloud/Azure/11. Data Tools/Azure Data Factory/README.md
@@ -1,29 +1,133 @@
-## 介绍 Introduction
+## 1. 介绍 Introduction
-Azure data factory (ADF) 是 Azure 提供的可横向扩张的 (scale out) 无服务的 (serverless) 的数据相关的一项服务。[^1]
+Azure data factory (ADF) 是 可横向扩张的 (scale out) 无服务的 (serverless) 的数据整合和迁移的服务。[["]](https://docs.microsoft.com/en-us/azure/data-factory/)
主要包含以下三个方面:
-- 数据集成 (data integration) :与不同数据源结合的能力。[^3]
-- 数据转换 (data transformation) :数据从一种格式转换成另一种格式的能力。[^4]
-- SSIS (SQL Server Integration Services) : 复制或下载文件,加载数据仓库,清除和挖掘数据以及管理 SQL Server 对象和数据。[^2]
+- 数据集成 (data integration) :与不同数据源结合的能力。[["]](https://en.wikipedia.org/wiki/Data_integration)
+- 数据转换 (data transformation) :数据从一种格式转换成另一种格式的能力。[["]](https://en.wikipedia.org/wiki/Data_transformation)
+- SSIS (SQL Server Integration Services) : 复制或下载文件,加载数据仓库,清除和挖掘数据以及管理 SQL Server 对象和数据。 [["]](https://docs.microsoft.com/zh-cn/sql/integration-services/sql-server-integration-services?view=sql-server-ver15)
-ADF 有版本区分,因此在 StackOverflow 上搜索,需要注意看标签是否带 v2。ADF 对标 AWS 的是 [AWS Data Pipeline](https://aws.amazon.com/cn/datapipeline)。
+## 2. 功能介绍
+一般:
+- **Execute Pipeline**: 执行管道。通过 monitor 可以看到 pipeline 的输入参数、重新执行 pipeline。在定义 pipeline 时,需要注意这点。
+- 数组(上限 100,000) [["]](https://learn.microsoft.com/en-us/azure/data-factory/control-flow-for-each-activity)
-## 延伸阅读 See also
+ - **[ForEach](https://learn.microsoft.com/en-us/azure/data-factory/control-flow-for-each-activity)**: 循环。并行数上限 50,ForEach 不能内嵌,但可通过内嵌 pipeline 来 workaround。[["]](https://learn.microsoft.com/en-us/fabric/data-factory/data-factory-limitations#data-pipeline-resource-limits)
+ - ...
+- Get Metadata: 获得文件的元数据。
+
+- Lookup: 通过 dataset 获得数据。输出最大支持 4 MB、5000 行
+
+ - 突破方式: 如果数据源有 index 的话,可以通过循环或者 util 的形式实现。(💡[官方的 workarounds](https://docs.microsoft.com/en-us/azure/data-factory/control-flow-lookup-activity#limitations-and-workarounds) 太模糊,无法参考使用)
+
+- Web: http 操作
+- webhook
+
+数据操作:
+
+- Copy Acitivity:数据复制。
+- Data Flow:数据复制和操作。比 Copy Activity 复杂。
+
+
+
+
+
+### 2.1. Copy Acitivity
+
+CosmosDB:
+
+- 建议使用 DB=>Storage=>DB 进行数据迁移(DB=>DB 时常会报错)。
+- batch size=1,`Request Size = Single Document Size * Write Batch Size` [["]](https://learn.microsoft.com/en-us/answers/questions/69129/copy-from-cosmosdb-to-cosmosdb-error-34-request-si),batch size 设置过高,可能会 CosmosDB request 2M 上限错误。
+- CosmosDB 单条数据大小上限为 2M,Copy Acitivity 的上限为 1.7M 左右。(不知道原因)
+- Data Flow 可以插入 2M 的数据,但会报奇怪的错误。
+- 并发量设置越低,使用的吞吐量会越低。[["]](https://learn.microsoft.com/en-us/azure/data-factory/copy-activity-performance-features#parallel-copy)
+- DIU:计算力[["]](https://learn.microsoft.com/en-us/azure/data-factory/copy-activity-performance-features#data-integration-units)
+- 性能调优:[Performance tuning steps](https://docs.microsoft.com/en-us/azure/data-factory/copy-activity-performance#performance-tuning-steps)
+- [数据映射](https://learn.microsoft.com/en-us/azure/data-factory/copy-activity-schema-and-type-mapping) 包含 flatten transformation 等操作。
+ - flatten 可以将一条数据内部 data List,扁平成多条数据。
+- `validateDataConsistency` 启动后会校验一致性。[["]](https://learn.microsoft.com/en-us/azure/data-factory/copy-activity-data-consistency)
+
+
+
+
+
+### 2.2. Data flow
+
+Data flow 用于数据转换。
+
+1. Data flow 一般用于对数据库、大文件进行转换,HTTP协议 一般会限制每分钟访问的速率。
+2. Data flow 不是用于备份数据,从 Data flow 中导入后,数据可能会有损失(Boolean=>String,integer=>String)
+
+[官网](https://docs.microsoft.com/en-us/azure/data-factory/data-flow-transformation-overview)提供了以下工具进行数据转换。工具以下概念相关
+
+- stream
+- MS SQL
+
+| Name | Category | Description |
+| :----------------------------------------------------------- | :---------------------- | :----------------------------------------------------------- |
+| [Aggregate](https://docs.microsoft.com/en-us/azure/data-factory/data-flow-aggregate) | Schema modifier | Define different types of aggregations such as SUM, MIN, MAX, and COUNT grouped by existing or computed columns. |
+| [Alter row](https://docs.microsoft.com/en-us/azure/data-factory/data-flow-alter-row) | Row modifier | Set insert, delete, update, and upsert policies on rows. |
+| [Conditional split](https://docs.microsoft.com/en-us/azure/data-factory/data-flow-conditional-split) | Multiple inputs/outputs | Route rows of data to different streams based on matching conditions. |
+| [Derived column](https://docs.microsoft.com/en-us/azure/data-factory/data-flow-derived-column) | Schema modifier | generate new columns or modify existing fields using the data flow expression language. |
+| [Exists](https://docs.microsoft.com/en-us/azure/data-factory/data-flow-exists) | Multiple inputs/outputs | Check whether your data exists in another source or stream. |
+| [Filter](https://docs.microsoft.com/en-us/azure/data-factory/data-flow-filter) | Row modifier | Filter a row based upon a condition. |
+| [Flatten](https://docs.microsoft.com/en-us/azure/data-factory/data-flow-flatten) | Schema modifier | Take array values inside hierarchical structures such as JSON and unroll them into individual rows. |
+| [Join](https://docs.microsoft.com/en-us/azure/data-factory/data-flow-join) | Multiple inputs/outputs | Combine data from two sources or streams. |
+| [Lookup](https://docs.microsoft.com/en-us/azure/data-factory/data-flow-lookup) | Multiple inputs/outputs | Reference data from another source. |
+| [New branch](https://docs.microsoft.com/en-us/azure/data-factory/data-flow-new-branch) | Multiple inputs/outputs | Apply multiple sets of operations and transformations against the same data stream. |
+| [Parse](https://docs.microsoft.com/en-us/azure/data-factory/data-flow-new-branch) | Formatter | Parse text columns in your data stream that are strings of JSON, delimited text, or XML formatted text. |
+| [Pivot](https://docs.microsoft.com/en-us/azure/data-factory/data-flow-pivot) | Schema modifier | An aggregation where one or more grouping columns has its distinct row values transformed into individual columns. |
+| [Rank](https://docs.microsoft.com/en-us/azure/data-factory/data-flow-rank) | Schema modifier | Generate an ordered ranking based upon sort conditions |
+| [Select](https://docs.microsoft.com/en-us/azure/data-factory/data-flow-select) | Schema modifier | Alias columns and stream names, and drop or reorder columns |
+| [Sink](https://docs.microsoft.com/en-us/azure/data-factory/data-flow-sink) | - | A final destination for your data |
+| [Sort](https://docs.microsoft.com/en-us/azure/data-factory/data-flow-sort) | Row modifier | Sort incoming rows on the current data stream |
+| [Source](https://docs.microsoft.com/en-us/azure/data-factory/data-flow-source) | - | A data source for the data flow |
+| [Surrogate key](https://docs.microsoft.com/en-us/azure/data-factory/data-flow-surrogate-key) | Schema modifier | Add an incrementing non-business arbitrary key value |
+| [Union](https://docs.microsoft.com/en-us/azure/data-factory/data-flow-union) | Multiple inputs/outputs | Combine multiple data streams vertically |
+| [Unpivot](https://docs.microsoft.com/en-us/azure/data-factory/data-flow-unpivot) | Schema modifier | Pivot columns into row values |
+| [Window](https://docs.microsoft.com/en-us/azure/data-factory/data-flow-window) | Schema modifier | Define window-based aggregations of columns in your data streams. |
+| [Parse](https://docs.microsoft.com/en-us/azure/data-factory/data-flow-parse) | Schema modifier | Parse column data to Json or delimited text |
+
+
+
+
+
+
+
+## 3. 监控
+
+方式包含:
+
+- alert 通知
+- 查看配置所配置日志路径
+- 查看 dashboard
+
+
+
+
+
+
+
+
+
+
+
+## 豆知識
+
+- ADF 有版本区分,因此在 StackOverflow 上搜索,需要注意看标签是否带 v2。ADF 对标 AWS 的是 [AWS Data Pipeline](https://aws.amazon.com/cn/datapipeline)。
- [ADF 反馈网站](https://feedback.azure.com/d365community/forum/1219ec2d-6c26-ec11-b6e6-000d3a4f032c#)
+- [ADF 模版文件](https://learn.microsoft.com/en-us/azure/data-factory/solution-templates-introduction)
+
+
-[^1]: [Azure Data Factory documentation](https://docs.microsoft.com/en-us/azure/data-factory/)
-[^2]: [SQL Server Integration Services](https://docs.microsoft.com/zh-cn/sql/integration-services/sql-server-integration-services?view=sql-server-ver15)
-[^3]: [Data integration - Wikipedia](https://en.wikipedia.org/wiki/Data_integration)
-[^4]:[Data transformation - Wikipedia](https://en.wikipedia.org/wiki/Data_transformation)
diff --git "a/230. Cloud/Azure/11. Data Tools/Azure Data Factory/\345\212\237\350\203\275\344\270\200\346\240\217.md" "b/230. Cloud/Azure/11. Data Tools/Azure Data Factory/\345\212\237\350\203\275\344\270\200\346\240\217.md"
deleted file mode 100644
index 8b4d3e0d..00000000
--- "a/230. Cloud/Azure/11. Data Tools/Azure Data Factory/\345\212\237\350\203\275\344\270\200\346\240\217.md"
+++ /dev/null
@@ -1,242 +0,0 @@
-
-
-## 数据工厂限制🚫
-
-数据工厂是多租户服务 (multitenant service)[^9] ,因此具有上限。具体参考[官网](https://docs.microsoft.com/en-US/azure/azure-resource-manager/management/azure-subscription-service-limits#data-factory-limits)。下面举一些例子
-
-- ForEach 并行数 ≤ 50
-
-- linked service ≤ 3000
-
- 当超过上限时,将会抛出以下类似的错误异常
-
- >There are substantial concurrent copy activity executions which is causing failures due to throttling under subscription xxxx, region jpe and limitation 3000. Please reduce the concurrent executions. For limits, refer
-
- 经过实验,可以同时启动 1500 个 Copy Activity,*也许*是因为每一个 Copy Activity 有 2 个 Linked Service。
-
-## Copy Activity
-
-### 概念
-
-source: 数据源
-
-[sink](https://en.wikipedia.org/wiki/Sink_(computing)): 接收器 (原意: 水槽,洗碗槽)
-
-Hierarchical 分层:JSON、XML、NoSQL
-
-tabular : 表格(excel、关系数据库)
-
-### 性能
-
-概念📙
-
-DIU (Data Integration Unit) [^1]这是 Azure云 特有的概念,介绍的文档比较少且模糊不清,笔者认为应解释为 "单位时间内,CPU、内存、网络资源分配等消耗的时间"
-
-策略♞
-
-- [For Each ](https://docs.microsoft.com/en-us/azure/data-factory/control-flow-for-each-activity) 拆分需要拷贝的数据,并行执行。
-- Copy Activity 的性能
- ![监视复制活动运行详细信息](/assets/blog_res/azure/monitor-copy-activity-run-details.png)
- 1. Azure 提供了[性能优化 (performance tuning) 提示](https://docs.microsoft.com/en-us/azure/data-factory/copy-activity-performance-troubleshooting)功能
- - [并行数的调优](https://docs.microsoft.com/en-us/azure/data-factory/copy-activity-performance-features#parallel-copy)
- - [颗粒大小的调优](https://docs.microsoft.com/en-us/azure/data-factory/copy-activity-performance-features#data-integration-units)
- 2. **Duration** 的内容常为优化的对象。[^3]
- 3. [暂存 (staging) 功能](https://docs.microsoft.com/en-us/azure/data-factory/copy-activity-performance-features#staged-copy) (Specify whether to copy data via an interim staging store. Enable staging only for the beneficial scenarios, e.g. load data into Azure Synapse Analytics via PolyBase, load data to/from Snowflake, load data from Amazon Redshift via UNLOAD or from HDFS via DistCp, etc.[Learn more](https://go.microsoft.com/fwlink/?linkid=2159335))
-
-测试步骤🧪[^7]
-
-1. 选择大数据
-2. 输入 [Data Integration Units (DIU)](https://docs.microsoft.com/en-us/azure/data-factory/copy-activity-performance#data-integration-units) 和 [parallel copy](https://docs.microsoft.com/en-us/azure/data-factory/copy-activity-performance#parallel-copy),并不断调试,最终获取最优数值
-3. 拆分需要拷贝的数据,并聚合结果。以下是官方的模板:
- - [Copy files from multiple containers](https://docs.microsoft.com/en-us/azure/data-factory/solution-template-copy-files-multiple-containers)
- - [Migrate data from Amazon S3 to ADLS Gen2](https://docs.microsoft.com/en-us/azure/data-factory/solution-template-migration-s3-azure)
- - [Bulk copy with a control table](https://docs.microsoft.com/en-us/azure/data-factory/solution-template-bulk-copy-with-control-table)
-
-
-
-### Schema映射 Schema Mapping
-
-Copy Activity 有一系列默认的映射策略。而配置显式映射 (Explicit mapping) 时,需加注意,不同的 source-sink 组合配置的方式是不同。[^2]
-
-![从表格映射到表格](/assets/blog_res/azure/map-tabular-to-tabular.png)
-
-Mapping 支持 *Fatten* 操作,可以讲一个 array 扁平化。这方便 JSON 转换成 table
-
-![使用 UI 从分层映射到表格](/assets/blog_res/azure/map-hierarchical-to-tabular-ui.png)
-
-
-
-
-
-
-
-### 数据一致性验证 Data consistency verification
-
-Copy Activity 提供了数据
一致性验证
。通过 `validateDataConsistency` 启动该校验。[^5]
-
-校验的*对象*以及*策略*♘
-
-- 二进制对象:file size, lastModifiedDate, MD5 checksum
-- 表格数据(tabular data):` 读取的行数 = 复制的行数 + 跳过的行数`
-
-*什么时候发生?*📅[^4]
-
-- 主键重复
-- 作为 source 的二进制文件不能访问、被删除
-
-当数据发生 *不一致性*⚠️时,可以通过 `dataInconsistency` 设置行为
-
-- 中止
-- 跳跃
-
-在设定 `logSettings` 和 `path` 可以记录 *不一致* 时候的日志。
-
-### 监控·容错·测试 Monitor·Fault tolerance·Test
-
-💿数据不一致
-
-当 *不允许数据不一致* 那么 Copy Activity 将重试或者中止。中止时,pipeline 将以失败的形式返回,此时可以
-
-1. 发送邮件通知
-2. 定期查看 监控 (monitor) 情况
-
-当 *允许数据不一致* 时,可以监控以下数据,并根据所得数据进行下一步策略下一步策略。[^4]
-
-- activity结果 (`@activity('Copy data').output`) [^6]
-- 日志文件
-
-📏测试
-
-可通过来回复制进行数据校验进行实现,示例如下:
-
-1. 备份 数据库-1 至 Azure Blob Storage
-2. Azure Blob Storage 将备份数据恢复至 数据库-2
-3. 数据库-1 和 数据库-2 的数据进行一一比较。
-
-目的: 数据在传输中是否有不可预料损失和变形。
-
-📝特殊需求
-
-监控 Copy Activity 的运行时长,当时长过长时,发送监控信息至运维人员。[^6]
-
-### 其他
-
-- 压缩功能
-
-
-
-## Data flow
-
-Data flow 用于数据转换。
-
-1. Data flow 一般用于对数据库、大文件进行转换,HTTP协议 一般会限制每分钟访问的速率。
-2. Data flow 不是用于备份数据,从 Data flow 中导入后,数据可能会有损失(Boolean=>String,integer=>String)
-
-[官网](https://docs.microsoft.com/en-us/azure/data-factory/data-flow-transformation-overview)提供了以下工具进行数据转换。工具以下概念相关
-
-- stream
-- MS SQL
-
-| Name | Category | Description |
-| :----------------------------------------------------------- | :---------------------- | :----------------------------------------------------------- |
-| [Aggregate](https://docs.microsoft.com/en-us/azure/data-factory/data-flow-aggregate) | Schema modifier | Define different types of aggregations such as SUM, MIN, MAX, and COUNT grouped by existing or computed columns. |
-| [Alter row](https://docs.microsoft.com/en-us/azure/data-factory/data-flow-alter-row) | Row modifier | Set insert, delete, update, and upsert policies on rows. |
-| [Conditional split](https://docs.microsoft.com/en-us/azure/data-factory/data-flow-conditional-split) | Multiple inputs/outputs | Route rows of data to different streams based on matching conditions. |
-| [Derived column](https://docs.microsoft.com/en-us/azure/data-factory/data-flow-derived-column) | Schema modifier | generate new columns or modify existing fields using the data flow expression language. |
-| [Exists](https://docs.microsoft.com/en-us/azure/data-factory/data-flow-exists) | Multiple inputs/outputs | Check whether your data exists in another source or stream. |
-| [Filter](https://docs.microsoft.com/en-us/azure/data-factory/data-flow-filter) | Row modifier | Filter a row based upon a condition. |
-| [Flatten](https://docs.microsoft.com/en-us/azure/data-factory/data-flow-flatten) | Schema modifier | Take array values inside hierarchical structures such as JSON and unroll them into individual rows. |
-| [Join](https://docs.microsoft.com/en-us/azure/data-factory/data-flow-join) | Multiple inputs/outputs | Combine data from two sources or streams. |
-| [Lookup](https://docs.microsoft.com/en-us/azure/data-factory/data-flow-lookup) | Multiple inputs/outputs | Reference data from another source. |
-| [New branch](https://docs.microsoft.com/en-us/azure/data-factory/data-flow-new-branch) | Multiple inputs/outputs | Apply multiple sets of operations and transformations against the same data stream. |
-| [Parse](https://docs.microsoft.com/en-us/azure/data-factory/data-flow-new-branch) | Formatter | Parse text columns in your data stream that are strings of JSON, delimited text, or XML formatted text. |
-| [Pivot](https://docs.microsoft.com/en-us/azure/data-factory/data-flow-pivot) | Schema modifier | An aggregation where one or more grouping columns has its distinct row values transformed into individual columns. |
-| [Rank](https://docs.microsoft.com/en-us/azure/data-factory/data-flow-rank) | Schema modifier | Generate an ordered ranking based upon sort conditions |
-| [Select](https://docs.microsoft.com/en-us/azure/data-factory/data-flow-select) | Schema modifier | Alias columns and stream names, and drop or reorder columns |
-| [Sink](https://docs.microsoft.com/en-us/azure/data-factory/data-flow-sink) | - | A final destination for your data |
-| [Sort](https://docs.microsoft.com/en-us/azure/data-factory/data-flow-sort) | Row modifier | Sort incoming rows on the current data stream |
-| [Source](https://docs.microsoft.com/en-us/azure/data-factory/data-flow-source) | - | A data source for the data flow |
-| [Surrogate key](https://docs.microsoft.com/en-us/azure/data-factory/data-flow-surrogate-key) | Schema modifier | Add an incrementing non-business arbitrary key value |
-| [Union](https://docs.microsoft.com/en-us/azure/data-factory/data-flow-union) | Multiple inputs/outputs | Combine multiple data streams vertically |
-| [Unpivot](https://docs.microsoft.com/en-us/azure/data-factory/data-flow-unpivot) | Schema modifier | Pivot columns into row values |
-| [Window](https://docs.microsoft.com/en-us/azure/data-factory/data-flow-window) | Schema modifier | Define window-based aggregations of columns in your data streams. |
-| [Parse](https://docs.microsoft.com/en-us/azure/data-factory/data-flow-parse) | Schema modifier | Parse column data to Json or delimited text |
-
-
-
-
-
-## 控制流 Control Flow
-
-- **Execute Pipeline**: 执行管道。通过 monitor 可以看到 pipeline 的输入参数、重新执行 pipeline。在定义 pipeline 时,需要注意这点。
-
-- 数组(上限 100,000) [^8]
-
- - **Append Variable**: 追加变量到数组里。
- - **Filter**: 过滤数组
- - **ForEach**: 循环数组。
- - 最大并行为 50,默认为 20,如需扩展则要多重 ForEach (Execute Pipeline + ForEach 的方式)。
- - 测试结果显示,设置最大并行数设置过高时,是按照最低数来执行。(💡那为何不全自动化呢?)
- - ForEach 的限制很多。
- - **Until**
-
-- 输入
-
- - **Get Metadata**: 获得文件的元数据。元数据不得超过 4 MB
-
- - **Lookup**: 通过 dataset 获得数据。
-
- - 输出最大支持 4 MB,如果大小超过此限制,活动将失败。
-
- - 最多可以返回 5000 行;如果结果集包含的记录超过此范围,将返回前 5000 行。
-
- - 突破方式: 如果数据源有 index 的话,可以通过循环或者 util 的形式实现。
-
- (💡[官方的 workarounds](https://docs.microsoft.com/en-us/azure/data-factory/control-flow-lookup-activity#limitations-and-workarounds) 太模糊,无法参考使用)
-
- - **Web**:
-
-- 输出
-
- - **Web**: 可以发送各种数据。另外还可以将 datasets 和 linkedServices 发送出去。
- - **webhook**
-
-- 条件语句
-
- - **If Condition**: if 语句
- - **Switch**
- - **Validation**: 等待文件。当文件或文件夹存在时,才能继续下一步。
- - **wait**: 等待一段时间后再执行下一步。
-
-- **Set Variable**: 设置变量
-
-## Delete Activity
-
-Delete Activity 仅仅用于删除文件。如需定时删除文件,则要与 schedule trigger 一起使用。
-
-
-
-## 外部服务
-
-### Databricks
-
-Azure Databricks 基于 Apache Spark 的快速、简单、协作分析平台
-
-### Azure Data Explorer
-
-数据分析
-
-
-
-
-
-[^1]: [Data Integration Units](https://docs.microsoft.com/en-us/azure/data-factory/copy-activity-performance-features#data-integration-units)
-[^2]: [Schema and data type mapping in copy activity - Microsoft Docs](https://docs.microsoft.com/en-us/azure/data-factory/copy-activity-schema-and-type-mapping)
-
-[^3]:[Troubleshoot copy activity on Azure IR](https://docs.microsoft.com/en-us/azure/data-factory/copy-activity-performance-troubleshooting#troubleshoot-copy-activity-on-azure-ir)
-[^4]: [Fault tolerance](https://docs.microsoft.com/en-us/azure/data-factory/copy-activity-fault-tolerance)
-[^5]: [Data consistency verification in copy activity - Azure](https://docs.microsoft.com/en-us/azure/data-factory/copy-activity-data-consistency)
-[^6]: [Monitor copy activity](https://docs.microsoft.com/en-us/azure/data-factory/copy-activity-monitoring)
-[^7]: [Performance tuning steps](https://docs.microsoft.com/en-us/azure/data-factory/copy-activity-performance#performance-tuning-steps)
-[^8]: [ForEach Activity](https://docs.microsoft.com/en-us/azure/data-factory/control-flow-for-each-activity)
-[^9]: [Data Factory limits](https://docs.microsoft.com/en-US/azure/azure-resource-manager/management/azure-subscription-service-limits#data-factory-limits)
\ No newline at end of file
diff --git a/230. Cloud/Azure/11. Data Tools/README.md "b/230. Cloud/Azure/11. Data Tools/\346\225\260\346\215\256\345\244\207\344\273\275.md"
similarity index 73%
rename from 230. Cloud/Azure/11. Data Tools/README.md
rename to "230. Cloud/Azure/11. Data Tools/\346\225\260\346\215\256\345\244\207\344\273\275.md"
index 5f0bb6b4..cbf818f8 100644
--- a/230. Cloud/Azure/11. Data Tools/README.md
+++ "b/230. Cloud/Azure/11. Data Tools/\346\225\260\346\215\256\345\244\207\344\273\275.md"
@@ -8,19 +8,19 @@
数据工具分为以下:
-1. 托管备份工具。如 CosmosDB 可选持续备份或者定期备份,但该工具只能在该服务里使用。
+1. **服务自带的备份**。如 CosmosDB 可选持续备份或者定期备份,但该工具只能在该服务里使用。
这种托管工具的持续备份通常十分成熟,可以恢复到某一个具体的时刻(point-to-restore),我们只需要在上面点击一下备份保存时间即可。
-2. 备份平台,如:**Azure Recovery Services Vault(RSV)** 和 **Azure Backup Vault(BV)**
+2. **备份平台提供的备份**。如:**Azure Recovery Services Vault(RSV)** 和 **Azure Backup Vault(BV)**
是(1)的补足。在这里,备份文件是托管的,可统一设置备份策略。但只能定期备份。
-3. 传统备份
+3. **传统备份**
如我们使用 virtualbox 的时,可以手动制作 snapshot 作为备份。Azure snapshot 也是类似的存在。
-4. 综合性数据工作: 如 Azure Data Factory
+4. **其他工具**: 如 Azure Data Factory。参考:[数据复制](./数据复制.md)
用于补足上述场景无法实现的操作。例,CosmosDB 无法复制数据,有时候我们需要复制数据做测试,这时候就需要外部工具(data factory)了。
@@ -51,7 +51,7 @@ Recovery Services vault 是一个存储的仓库。而实际做备份和恢复
-## 3. 虚拟机的备份策略
+## 3. 虚拟机备份
虚拟机的备份与恢复有若干种策略:
@@ -100,45 +100,13 @@ Recovery Services vault 是一个存储的仓库。而实际做备份和恢复
-## 4. 数据工作
-![image-20240424132626001](https://raw.githubusercontent.com/caliburn1994/caliburn1994.github.io/dev/images/20240424132628.png)
-数据同步有几种方式
-- [Data factory](Azure%20Data%20Factory): 综合性比较好,各种迁移工具都有。可重复执行。
-- AzCopy: 本地命令工具。**适用:复制 blob 和 file。**
-- Azure Import/Export: 数据装进硬盘里,发送到 Azure 的数据中心或者发过来发送到客户手中。**适用:复制 blob 和 file。**
-
-### 4.1. Azure Import/Export
-Azure Import/Export:
-- Import: 数据装进硬盘里,发送到 Azure 的数据中心。(支持服务: Azure Blob storage、Azure Files) [[”]](https://learn.microsoft.com/en-us/azure/import-export/storage-import-export-service)
- 所需配置文件:
-
- - a dataset CSV file: 文件信息
- - a driveset CSV file: 驱动信息
-- Export: 从数据中心将数据取出来,存到硬盘发送到客户手上。
-
-
-
-### 4.2. Azcopy
-
-**AzCopy** 是命令行工具,可从数据源下载到本地,或者从本地上传。 [[”]](https://learn.microsoft.com/en-us/azure/storage/common/storage-use-azcopy-v10#download-azcopy)
-
-- Azure Blob Storage、Azure File
-- Azure Table 只支持旧版本 ll **[AzCopy version 7.3](https://aka.ms/downloadazcopynet)** ,新本不支持
-
-| 命令行 | 说明 |
-| ----------- | ---------------------------------- |
-| azcopy make | Creates a container or file share. |
-
-**QA-1: AzCopy 连接 Azure Blob Storage、Azure File 通过什么方式验证?**
-
-A: Microsoft Entra ID 、a Shared Access Signature (SAS) token
diff --git "a/230. Cloud/Azure/11. Data Tools/\346\225\260\346\215\256\345\244\215\345\210\266.md" "b/230. Cloud/Azure/11. Data Tools/\346\225\260\346\215\256\345\244\215\345\210\266.md"
new file mode 100644
index 00000000..f4a413ef
--- /dev/null
+++ "b/230. Cloud/Azure/11. Data Tools/\346\225\260\346\215\256\345\244\215\345\210\266.md"
@@ -0,0 +1,71 @@
+## 介绍
+
+![image-20240703114404592](https://raw.githubusercontent.com/caliburn1994/caliburn1994.github.io/dev/images/20240703114408.png)
+
+数据复制工具有:
+
+- [Data factory](Azure%20Data%20Factory): 综合性数据复制、聚合工具。运行在云上。
+- [Azure Cosmos DB Desktop Data Migration Tool](https://github.com/AzureCosmosDB/data-migration-desktop-tool):综合性数据复制工具。运行在本地机器上。
+- AzCopy: 本地命令工具。**适用:复制 blob 和 file。**
+- Azure Import/Export: 数据装进硬盘里,发送到 Azure 的数据中心或者发过来发送到客户手中。**适用:复制 blob 和 file。**
+
+
+
+## 2. 工具
+
+### 2.1. Azure Import/Export
+
+Azure Import/Export:
+
+- Import: 数据装进硬盘里,发送到 Azure 的数据中心。(支持服务: Azure Blob storage、Azure Files) [["]](https://learn.microsoft.com/en-us/azure/import-export/storage-import-export-service)
+
+ 所需配置文件:
+
+ - a dataset CSV file: 文件信息
+ - a driveset CSV file: 驱动信息
+
+- Export: 从数据中心将数据取出来,存到硬盘发送到客户手上。
+
+
+
+### 2.2. Azcopy
+
+**AzCopy** 是命令行工具,可从数据源下载到本地,或者从本地上传。 [["]](https://learn.microsoft.com/en-us/azure/storage/common/storage-use-azcopy-v10#download-azcopy)
+
+- Azure Blob Storage、Azure File
+- Azure Table 只支持旧版本 ll **[AzCopy version 7.3](https://aka.ms/downloadazcopynet)** ,新本不支持
+
+| 命令行 | 说明 |
+| ----------- | ---------------------------------- |
+| azcopy make | Creates a container or file share. |
+
+**QA-1: AzCopy 连接 Azure Blob Storage、Azure File 通过什么方式验证?**
+
+A: Microsoft Entra ID 、a Shared Access Signature (SAS) token
+
+
+
+
+
+## 3. 校验
+
+以 CosmosDB 为例,原数据应该包含以下内容:
+
+- 接近各种极限的数据,如:2M 大小的数据、包含很深层次的数据
+- 日期、小数点
+
+校验手段
+
+- 抽样校验
+- 数据源和数据目标的数据一一对应。
+
+
+
+
+
+
+
+
+
+
+
diff --git "a/230. Cloud/Azure/11. Data Tools/\350\277\201\347\247\273\344\270\216\345\244\207\344\273\275.xmind" "b/230. Cloud/Azure/11. Data Tools/\350\277\201\347\247\273\344\270\216\345\244\207\344\273\275.xmind"
index 3348c50b..4efb1cc1 100644
Binary files "a/230. Cloud/Azure/11. Data Tools/\350\277\201\347\247\273\344\270\216\345\244\207\344\273\275.xmind" and "b/230. Cloud/Azure/11. Data Tools/\350\277\201\347\247\273\344\270\216\345\244\207\344\273\275.xmind" differ
diff --git a/images/20240424132628.png b/images/20240424132628.png
deleted file mode 100644
index 7f1627c4..00000000
Binary files a/images/20240424132628.png and /dev/null differ