Skip to content

Commit b25ff7a

Browse files
authored
Merge pull request #886 from opendatalab/release-0.9.2
Release 0.9.2
2 parents 3fd024d + aeae1d0 commit b25ff7a

8 files changed

+98
-74
lines changed

README.md

+27-23
Original file line numberDiff line numberDiff line change
@@ -42,7 +42,7 @@
4242
</div>
4343

4444
# Changelog
45-
- 2024/11/06 0.9.1 released. Integrated the [StructTable-InternVL2-1B](https://huggingface.co/U4R/StructTable-InternVL2-1B) model for table recognition functionality.
45+
- 2024/11/06 0.9.2 released. Integrated the [StructTable-InternVL2-1B](https://huggingface.co/U4R/StructTable-InternVL2-1B) model for table recognition functionality.
4646
- 2024/10/31 0.9.0 released. This is a major new version with extensive code refactoring, addressing numerous issues, improving performance, reducing hardware requirements, and enhancing usability:
4747
- Refactored the sorting module code to use [layoutreader](https://github.com/ppaanngggg/layoutreader) for reading order sorting, ensuring high accuracy in various layouts.
4848
- Refactored the paragraph concatenation module to achieve good results in cross-column, cross-page, cross-figure, and cross-table scenarios.
@@ -138,13 +138,14 @@ There are three different ways to experience MinerU:
138138
- [Quick CPU Demo (Windows, Linux, Mac)](#quick-cpu-demo)
139139
- [Linux/Windows + CUDA](#Using-GPU)
140140

141-
**⚠️ Pre-installation Notice—Hardware and Software Environment Support**
142-
143-
To ensure the stability and reliability of the project, we only optimize and test for specific hardware and software environments during development. This ensures that users deploying and running the project on recommended system configurations will get the best performance with the fewest compatibility issues.
144-
145-
By focusing resources on the mainline environment, our team can more efficiently resolve potential bugs and develop new features.
146-
147-
In non-mainline environments, due to the diversity of hardware and software configurations, as well as third-party dependency compatibility issues, we cannot guarantee 100% project availability. Therefore, for users who wish to use this project in non-recommended environments, we suggest carefully reading the documentation and FAQ first. Most issues already have corresponding solutions in the FAQ. We also encourage community feedback to help us gradually expand support.
141+
> [!WARNING]
142+
> **Pre-installation Notice—Hardware and Software Environment Support**
143+
>
144+
> To ensure the stability and reliability of the project, we only optimize and test for specific hardware and software environments during development. This ensures that users deploying and running the project on recommended system configurations will get the best performance with the fewest compatibility issues.
145+
>
146+
> By focusing resources on the mainline environment, our team can more efficiently resolve potential bugs and develop new features.
147+
>
148+
> In non-mainline environments, due to the diversity of hardware and software configurations, as well as third-party dependency compatibility issues, we cannot guarantee 100% project availability. Therefore, for users who wish to use this project in non-recommended environments, we suggest carefully reading the documentation and FAQ first. Most issues already have corresponding solutions in the FAQ. We also encourage community feedback to help us gradually expand support.
148149
149150
<table>
150151
<tr>
@@ -224,11 +225,13 @@ Refer to [How to Download Model Files](docs/how_to_download_models_en.md) for de
224225
After completing the [2. Download model weight files](#2-download-model-weight-files) step, the script will automatically generate a `magic-pdf.json` file in the user directory and configure the default model path.
225226
You can find the `magic-pdf.json` file in your 【user directory】.
226227

228+
> [!TIP]
227229
> The user directory for Windows is "C:\\Users\\username", for Linux it is "/home/username", and for macOS it is "/Users/username".
228230
229231
You can modify certain configurations in this file to enable or disable features, such as table recognition:
230232

231233

234+
> [!NOTE]
232235
> If the following items are not present in the JSON, please manually add the required items and remove the comment content (standard JSON does not support comments).
233236
234237
```json
@@ -257,13 +260,14 @@ If your device supports CUDA and meets the GPU requirements of the mainline envi
257260
- [Ubuntu 22.04 LTS + GPU](docs/README_Ubuntu_CUDA_Acceleration_en_US.md)
258261
- [Windows 10/11 + GPU](docs/README_Windows_CUDA_Acceleration_en_US.md)
259262
- Quick Deployment with Docker
260-
> Docker requires a GPU with at least 16GB of VRAM, and all acceleration features are enabled by default.
261-
>
262-
> Before running this Docker, you can use the following command to check if your device supports CUDA acceleration on Docker.
263-
>
264-
> ```bash
265-
> docker run --rm --gpus=all nvidia/cuda:12.1.0-base-ubuntu22.04 nvidia-smi
266-
> ```
263+
> [!IMPORTANT]
264+
> Docker requires a GPU with at least 16GB of VRAM, and all acceleration features are enabled by default.
265+
>
266+
> Before running this Docker, you can use the following command to check if your device supports CUDA acceleration on Docker.
267+
>
268+
> ```bash
269+
> docker run --rm --gpus=all nvidia/cuda:12.1.0-base-ubuntu22.04 nvidia-smi
270+
> ```
267271
```bash
268272
wget https://github.com/opendatalab/MinerU/raw/master/Dockerfile
269273
docker build -t mineru:latest .
@@ -325,8 +329,8 @@ The results will be saved in the `{some_output_dir}` directory. The output file
325329
├── some_pdf_spans.pdf # smallest granularity bbox position information diagram
326330
└── some_pdf_content_list.json # Rich text JSON arranged in reading order
327331
```
328-
329-
For more information about the output files, please refer to the [Output File Description](docs/output_file_en_us.md).
332+
> [!TIP]
333+
> For more information about the output files, please refer to the [Output File Description](docs/output_file_en_us.md).
330334
331335
### API
332336

@@ -377,12 +381,12 @@ TODO
377381

378382
# TODO
379383

380-
- 🗹 Reading order based on the model
381-
- 🗹 Recognition of `index` and `list` in the main text
382-
- 🗹 Table recognition
383-
- Code block recognition in the main text
384-
- [Chemical formula recognition](docs/chemical_knowledge_introduction/introduction.pdf)
385-
- Geometric shape recognition
384+
- [x] Reading order based on the model
385+
- [x] Recognition of `index` and `list` in the main text
386+
- [x] Table recognition
387+
- [ ] Code block recognition in the main text
388+
- [ ] [Chemical formula recognition](docs/chemical_knowledge_introduction/introduction.pdf)
389+
- [ ] Geometric shape recognition
386390

387391
# Known Issues
388392

README_zh-CN.md

+30-22
Original file line numberDiff line numberDiff line change
@@ -43,7 +43,7 @@
4343

4444
# 更新记录
4545

46-
- 2024/11/06 0.9.1发布,为表格识别功能接入了[StructTable-InternVL2-1B](https://huggingface.co/U4R/StructTable-InternVL2-1B)模型
46+
- 2024/11/06 0.9.2发布,为表格识别功能接入了[StructTable-InternVL2-1B](https://huggingface.co/U4R/StructTable-InternVL2-1B)模型
4747
- 2024/10/31 0.9.0发布,这是我们进行了大量代码重构的全新版本,解决了众多问题,提升了性能,降低了硬件需求,并提供了更丰富的易用性:
4848
- 重构排序模块代码,使用 [layoutreader](https://github.com/ppaanngggg/layoutreader) 进行阅读顺序排序,确保在各种排版下都能实现极高准确率
4949
- 重构段落拼接模块,在跨栏、跨页、跨图、跨表情况下均能实现良好的段落拼接效果
@@ -139,13 +139,15 @@ https://github.com/user-attachments/assets/4bea02c9-6d54-4cd6-97ed-dff14340982c
139139
- [使用CPU快速体验(Windows,Linux,Mac)](#使用cpu快速体验)
140140
- [Linux/Windows + CUDA](#使用gpu)
141141

142-
**⚠️安装前必看——软硬件环境支持说明**
143142

144-
为了确保项目的稳定性和可靠性,我们在开发过程中仅对特定的软硬件环境进行优化和测试。这样当用户在推荐的系统配置上部署和运行项目时,能够获得最佳的性能表现和最少的兼容性问题。
145-
146-
通过集中资源和精力于主线环境,我们团队能够更高效地解决潜在的BUG,及时开发新功能。
147-
148-
在非主线环境中,由于硬件、软件配置的多样性,以及第三方依赖项的兼容性问题,我们无法100%保证项目的完全可用性。因此,对于希望在非推荐环境中使用本项目的用户,我们建议先仔细阅读文档以及FAQ,大多数问题已经在FAQ中有对应的解决方案,除此之外我们鼓励社区反馈问题,以便我们能够逐步扩大支持范围。
143+
> [!WARNING]
144+
> **安装前必看——软硬件环境支持说明**
145+
>
146+
> 为了确保项目的稳定性和可靠性,我们在开发过程中仅对特定的软硬件环境进行优化和测试。这样当用户在推荐的系统配置上部署和运行项目时,能够获得最佳的性能表现和最少的兼容性问题。
147+
>
148+
> 通过集中资源和精力于主线环境,我们团队能够更高效地解决潜在的BUG,及时开发新功能。
149+
>
150+
> 在非主线环境中,由于硬件、软件配置的多样性,以及第三方依赖项的兼容性问题,我们无法100%保证项目的完全可用性。因此,对于希望在非推荐环境中使用本项目的用户,我们建议先仔细阅读文档以及FAQ,大多数问题已经在FAQ中有对应的解决方案,除此之外我们鼓励社区反馈问题,以便我们能够逐步扩大支持范围。
149151
150152
<table>
151153
<tr>
@@ -211,7 +213,8 @@ https://github.com/user-attachments/assets/4bea02c9-6d54-4cd6-97ed-dff14340982c
211213

212214
#### 1. 安装magic-pdf
213215

214-
最新版本国内镜像源同步可能会有延迟,请耐心等待
216+
> [!NOTE]
217+
> 最新版本国内镜像源同步可能会有延迟,请耐心等待
215218
216219
```bash
217220
conda create -n MinerU python=3.10
@@ -227,10 +230,13 @@ pip install -U magic-pdf[full] --extra-index-url https://wheels.myhloli.com -i h
227230

228231
完成[2. 下载模型权重文件](#2-下载模型权重文件)步骤后,脚本会自动生成用户目录下的magic-pdf.json文件,并自动配置默认模型路径。
229232
您可在【用户目录】下找到magic-pdf.json文件。
233+
234+
> [!TIP]
230235
> windows的用户目录为 "C:\\Users\\用户名", linux用户目录为 "/home/用户名", macOS用户目录为 "/Users/用户名"
231236
232237
您可修改该文件中的部分配置实现功能的开关,如表格识别功能:
233238

239+
> [!NOTE]
234240
>如json内没有如下项目,请手动添加需要的项目,并删除注释内容(标准json不支持注释)
235241
236242
```json
@@ -259,13 +265,14 @@ pip install -U magic-pdf[full] --extra-index-url https://wheels.myhloli.com -i h
259265
- [Ubuntu22.04LTS + GPU](docs/README_Ubuntu_CUDA_Acceleration_zh_CN.md)
260266
- [Windows10/11 + GPU](docs/README_Windows_CUDA_Acceleration_zh_CN.md)
261267
- 使用Docker快速部署
262-
> Docker 需设备gpu显存大于等于16GB,默认开启所有加速功能
263-
>
264-
> 运行本docker前可以通过以下命令检测自己的设备是否支持在docker上使用CUDA加速
265-
>
266-
> ```bash
267-
> docker run --rm --gpus=all nvidia/cuda:12.1.0-base-ubuntu22.04 nvidia-smi
268-
> ```
268+
> [!IMPORTANT]
269+
> Docker 需设备gpu显存大于等于16GB,默认开启所有加速功能
270+
>
271+
> 运行本docker前可以通过以下命令检测自己的设备是否支持在docker上使用CUDA加速
272+
>
273+
> ```bash
274+
> docker run --rm --gpus=all nvidia/cuda:12.1.0-base-ubuntu22.04 nvidia-smi
275+
> ```
269276
```bash
270277
wget https://github.com/opendatalab/MinerU/raw/master/Dockerfile
271278
docker build -t mineru:latest .
@@ -329,7 +336,8 @@ magic-pdf -p {some_pdf} -o {some_output_dir} -m auto
329336
└── some_pdf_content_list.json # 按阅读顺序排列的富文本json
330337
```
331338

332-
更多有关输出文件的信息,请参考[输出文件说明](docs/output_file_zh_cn.md)
339+
> [!TIP]
340+
> 更多有关输出文件的信息,请参考[输出文件说明](docs/output_file_zh_cn.md)
333341
334342
### API
335343

@@ -380,12 +388,12 @@ TODO
380388

381389
# TODO
382390

383-
- 🗹 基于模型的阅读顺序
384-
- 🗹 正文中目录、列表识别
385-
- 🗹 表格识别
386-
- 正文中代码块识别
387-
- [化学式识别](docs/chemical_knowledge_introduction/introduction.pdf)
388-
- 几何图形识别
391+
- [x] 基于模型的阅读顺序
392+
- [x] 正文中目录、列表识别
393+
- [x] 表格识别
394+
- [ ] 正文中代码块识别
395+
- [ ] [化学式识别](docs/chemical_knowledge_introduction/introduction.pdf)
396+
- [ ] 几何图形识别
389397

390398
# Known Issues
391399

docs/README_Ubuntu_CUDA_Acceleration_en_US.md

+11-9
Original file line numberDiff line numberDiff line change
@@ -8,7 +8,8 @@ nvidia-smi
88

99
If you see information similar to the following, it means that the NVIDIA drivers are already installed, and you can skip Step 2.
1010

11-
Notice:`CUDA Version` should be >= 12.1, If the displayed version number is less than 12.1, please upgrade the driver.
11+
> [!NOTE]
12+
> Notice:`CUDA Version` should be >= 12.1, If the displayed version number is less than 12.1, please upgrade the driver.
1213
1314
```plaintext
1415
+---------------------------------------------------------------------------------------+
@@ -64,14 +65,14 @@ conda activate MinerU
6465
```sh
6566
pip install -U magic-pdf[full] --extra-index-url https://wheels.myhloli.com
6667
```
67-
68-
After installation, make sure to check the version of `magic-pdf` using the following command:
69-
70-
```sh
71-
magic-pdf --version
72-
```
73-
74-
If the version number is less than 0.7.0, please report the issue.
68+
> [!IMPORTANT]
69+
> After installation, make sure to check the version of `magic-pdf` using the following command:
70+
>
71+
> ```sh
72+
> magic-pdf --version
73+
> ```
74+
>
75+
> If the version number is less than 0.7.0, please report the issue.
7576
7677
### 6. Download Models
7778
@@ -84,6 +85,7 @@ Refer to detailed instructions on [how to download model files](how_to_download_
8485
After completing the [6. Download Models](#6-download-models) step, the script will automatically generate a `magic-pdf.json` file in the user directory and configure the default model path.
8586
You can find the `magic-pdf.json` file in your user directory.
8687
88+
> [!TIP]
8789
> The user directory for Linux is "/home/username".
8890
8991

docs/README_Ubuntu_CUDA_Acceleration_zh_CN.md

+9-7
Original file line numberDiff line numberDiff line change
@@ -8,7 +8,8 @@ nvidia-smi
88

99
如果看到类似如下的信息,说明已经安装了nvidia驱动,可以跳过步骤2
1010

11-
注意:`CUDA Version` 显示的版本号应 >= 12.1,如显示的版本号小于12.1,请升级驱动
11+
> [!NOTE]
12+
> `CUDA Version` 显示的版本号应 >= 12.1,如显示的版本号小于12.1,请升级驱动
1213
1314
```plaintext
1415
+---------------------------------------------------------------------------------------+
@@ -65,7 +66,8 @@ conda activate MinerU
6566
pip install -U magic-pdf[full] --extra-index-url https://wheels.myhloli.com -i https://mirrors.aliyun.com/pypi/simple
6667
```
6768

68-
> ❗️下载完成后,务必通过以下命令确认magic-pdf的版本是否正确
69+
> [!IMPORTANT]
70+
> 下载完成后,务必通过以下命令确认magic-pdf的版本是否正确
6971
>
7072
> ```bash
7173
> magic-pdf --version
@@ -83,7 +85,7 @@ pip install -U magic-pdf[full] --extra-index-url https://wheels.myhloli.com -i h
8385
完成[6.下载模型](#6-下载模型)步骤后,脚本会自动生成用户目录下的magic-pdf.json文件,并自动配置默认模型路径。
8486
您可在【用户目录】下找到magic-pdf.json文件。
8587
86-
88+
> [!TIP]
8789
> linux用户目录为 "/home/用户名"
8890
8991
## 8. 第一次运行
@@ -112,8 +114,8 @@ magic-pdf -p small_ocr.pdf -o ./output
112114
```bash
113115
magic-pdf -p small_ocr.pdf -o ./output
114116
```
115-
116-
> 提示:CUDA加速是否生效可以根据log中输出的各个阶段cost耗时来简单判断,通常情况下,`layout detection cost``mfr time` 应提速10倍以上。
117+
> [!TIP]
118+
> CUDA加速是否生效可以根据log中输出的各个阶段cost耗时来简单判断,通常情况下,`layout detection cost``mfr time` 应提速10倍以上。
117119
118120
## 10. 为ocr开启cuda加速
119121
@@ -128,5 +130,5 @@ python -m pip install paddlepaddle-gpu==3.0.0b1 -i https://www.paddlepaddle.org.
128130
```bash
129131
magic-pdf -p small_ocr.pdf -o ./output
130132
```
131-
132-
> 提示:CUDA加速是否生效可以根据log中输出的各个阶段cost耗时来简单判断,通常情况下,`ocr cost`应提速10倍以上。
133+
> [!TIP]
134+
> CUDA加速是否生效可以根据log中输出的各个阶段cost耗时来简单判断,通常情况下,`ocr cost`应提速10倍以上。

docs/README_Windows_CUDA_Acceleration_en_US.md

+5-3
Original file line numberDiff line numberDiff line change
@@ -28,7 +28,8 @@ conda activate MinerU
2828
pip install -U magic-pdf[full] --extra-index-url https://wheels.myhloli.com
2929
```
3030

31-
> ❗️After installation, verify the version of `magic-pdf`:
31+
> [!IMPORTANT]
32+
> After installation, verify the version of `magic-pdf`:
3233
>
3334
> ```bash
3435
> magic-pdf --version
@@ -45,6 +46,7 @@ Refer to detailed instructions on [how to download model files](how_to_download_
4546
After completing the [5. Download Models](#5-download-models) step, the script will automatically generate a `magic-pdf.json` file in the user directory and configure the default model path.
4647
You can find the `magic-pdf.json` file in your 【user directory】 .
4748
49+
> [!TIP]
4850
> The user directory for Windows is "C:/Users/username".
4951
5052
### 7. First Run
@@ -65,8 +67,8 @@ If your graphics card has at least 8GB of VRAM, follow these steps to test CUDA-
6567
```
6668
pip install --force-reinstall torch==2.3.1 torchvision==0.18.1 --index-url https://download.pytorch.org/whl/cu118
6769
```
68-
69-
> ❗️Ensure the following versions are specified in the command:
70+
> [!IMPORTANT]
71+
> Ensure the following versions are specified in the command:
7072
>
7173
> ```
7274
> torch==2.3.1 torchvision==0.18.1

docs/README_Windows_CUDA_Acceleration_zh_CN.md

+9-6
Original file line numberDiff line numberDiff line change
@@ -29,7 +29,8 @@ conda activate MinerU
2929
pip install -U magic-pdf[full] --extra-index-url https://wheels.myhloli.com -i https://mirrors.aliyun.com/pypi/simple
3030
```
3131

32-
> ❗️下载完成后,务必通过以下命令确认magic-pdf的版本是否正确
32+
> [!IMPORTANT]
33+
> 下载完成后,务必通过以下命令确认magic-pdf的版本是否正确
3334
>
3435
> ```bash
3536
> magic-pdf --version
@@ -46,7 +47,7 @@ pip install -U magic-pdf[full] --extra-index-url https://wheels.myhloli.com -i h
4647
完成[5.下载模型](#5-下载模型)步骤后,脚本会自动生成用户目录下的magic-pdf.json文件,并自动配置默认模型路径。
4748
您可在【用户目录】下找到magic-pdf.json文件。
4849
49-
50+
> [!TIP]
5051
> windows用户目录为 "C:/Users/用户名"
5152
5253
## 7. 第一次运行
@@ -68,7 +69,8 @@ pip install -U magic-pdf[full] --extra-index-url https://wheels.myhloli.com -i h
6869
pip install --force-reinstall torch==2.3.1 torchvision==0.18.1 --index-url https://download.pytorch.org/whl/cu118
6970
```
7071
71-
> ❗️务必在命令中指定以下版本
72+
> [!IMPORTANT]
73+
> 务必在命令中指定以下版本
7274
>
7375
> ```bash
7476
> torch==2.3.1 torchvision==0.18.1
@@ -90,7 +92,8 @@ pip install --force-reinstall torch==2.3.1 torchvision==0.18.1 --index-url https
9092
magic-pdf -p small_ocr.pdf -o ./output
9193
```
9294
93-
> 提示:CUDA加速是否生效可以根据log中输出的各个阶段的耗时来简单判断,通常情况下,`layout detection time``mfr time` 应提速10倍以上。
95+
> [!TIP]
96+
> CUDA加速是否生效可以根据log中输出的各个阶段的耗时来简单判断,通常情况下,`layout detection time``mfr time` 应提速10倍以上。
9497
9598
## 9. 为ocr开启cuda加速
9699
@@ -105,5 +108,5 @@ pip install paddlepaddle-gpu==2.6.1
105108
```bash
106109
magic-pdf -p small_ocr.pdf -o ./output
107110
```
108-
109-
> 提示:CUDA加速是否生效可以根据log中输出的各个阶段cost耗时来简单判断,通常情况下,`ocr time`应提速10倍以上。
111+
> [!TIP]
112+
> CUDA加速是否生效可以根据log中输出的各个阶段cost耗时来简单判断,通常情况下,`ocr time`应提速10倍以上。

docs/how_to_download_models_en.md

+3-2
Original file line numberDiff line numberDiff line change
@@ -20,12 +20,13 @@ The configuration file can be found in the user directory, with the filename `ma
2020

2121
## 1. Models downloaded via Git LFS
2222

23+
> [!IMPORTANT]
2324
> Due to feedback from some users that downloading model files using git lfs was incomplete or resulted in corrupted model files, this method is no longer recommended.
25+
>
26+
> For versions 0.9.x and later, due to the repository change and the addition of the layout sorting model in PDF-Extract-Kit 1.0, the models cannot be updated using the `git pull` command. Instead, a Python script must be used for one-click updates.
2427
2528
When magic-pdf <= 0.8.1, if you have previously downloaded the model files via git lfs, you can navigate to the previous download directory and update the models using the `git pull` command.
2629

27-
> For versions 0.9.x and later, due to the repository change and the addition of the layout sorting model in PDF-Extract-Kit 1.0, the models cannot be updated using the `git pull` command. Instead, a Python script must be used for one-click updates.
28-
2930
## 2. Models downloaded via Hugging Face or Model Scope
3031

3132
If you previously downloaded models via Hugging Face or Model Scope, you can rerun the Python script used for the initial download. This will automatically update the model directory to the latest version.

0 commit comments

Comments
 (0)