mindspore-lab · babahaochi · Jan 6, 2026
diff --git a/Season2.step_into_llm/14.chatpdf/ChatPDF/README.md b/Season2.step_into_llm/14.chatpdf/ChatPDF/README.md
@@ -1,48 +1,212 @@
 # ChatPDF
 
-The ChatPDF(PDF Chatbot) is an application that allows users to upload PDF files and interact with pdf using a chatbot. Users can ask questions or provide input, and the chatbot will generate responses based on the provided information.
+The ChatPDF (PDF Chatbot) is an application that allows users to upload PDF files and interact with PDF using a chatbot. Users can ask questions or provide input, and the chatbot will generate responses based on the provided information.
 
 ## Technologies Used
 
-- MindSpore
-- MindNLP
-- ms2vec
-- msimilarities
+- **MindSpore**: Deep learning framework (>= 2.6.0)
+- **MindNLP**: Natural Language Processing toolkit (>= 0.4.1)
+- **ms2vec**: Sentence embedding library
+- **msimilarities**: Similarity search library
+- **Gradio**: Web UI framework
 
+## Version Requirements
 
-## Demo Video
+- **Python**: >= 3.9 (Recommended: 3.11)
+- **MindSpore**: >= 2.6.0
+- **MindNLP**: >= 0.4.1 (Tested with 0.4.1)
 
-[Demo Video]()
-
-[![ChatPDF](./assets/chatpdf.png)]()
+**Note**: This code has been migrated to be compatible with MindSpore 2.6.0+ and MindNLP 0.4.0+. If you need to use older versions, please check the git history for the previous version.
 
 ## Installation
 
+### Option 1: Quick Installation (Recommended)
+
+Use the provided installation script:
+
+```bash
+# Windows
+install.bat
+
+# Linux/Mac
+bash install.sh
+```
+
+### Option 2: Manual Installation
+
 1. Clone the repository:
 
-   ```bash
-   git clone https://github.com/lvyufeng/ChatPDF.git
-   ```
+```bash
+git clone https://github.com/mindspore-courses/step_into_llm.git
+cd step_into_llm/Season2.step_into_llm/14.chatpdf
+```
+
+2. Create a Python 3.11 virtual environment:
+
+```bash
+# Windows
+python -m venv chatpdf_env
+chatpdf_env\Scripts\activate
+
+# Linux/Mac
+python3.11 -m venv chatpdf_env
+source chatpdf_env/bin/activate
+```
+
+3. Install dependencies with locked versions:
+
+```bash
+# Upgrade pip
+python -m pip install --upgrade pip
+
+# Install MindSpore 2.6.0 (CPU version)
+pip install mindspore==2.6.0 -f https://www.mindspore.cn/install
+
+# Install MindNLP 0.4.1
+pip install mindnlp==0.4.1
+
+# Install other dependencies
+pip install -r ChatPDF/requirements_locked.txt
+```
+
+### Option 3: Using requirements.txt
+
+For development or testing, you can use the standard requirements file:
+
+```bash
+pip install -r ChatPDF/requirements.txt
+```
+
+**Warning**: Using the standard requirements.txt may install the latest versions, which might not be compatible. For production use, please use `requirements_locked.txt`.
+
+## Migration Notes (From MindNLP 0.3.x to 0.4.x)
+
+This code has been migrated according to [PR #1519](https://github.com/mindspore-lab/mindnlp/pull/1519/files). Key changes:
+
+### 1. Model Training Mode API
+**Before (MindNLP < 0.4.0):**
+```python
+model.set_train(False)
+```
 
-2. Install the required dependencies:
+**After (MindNLP >= 0.4.0):**
+```python
+model.set_train(mode=False)
+```
 
-   ```bash
-   pip install -r requirements.txt
-   ```
+### 2. Functional API Changes
+**Before:**
+```python
+ops.log_softmax(logits, axis=-1)
+ops.gather_elements(dim=-1, index=labels)
+ops.cat(tensors, axis=0)
+```
+
+**After:**
+```python
+F.log_softmax(logits, dim=-1)  # Using functional module
+ops.gather(tensors, dim=-1, index=labels)
+ops.cat(tensors, dim=0)
+```
+
+### 3. Optimizer Import Changes
+**Before:**
+```python
+import mindspore.experimental.optim as optim
+```
+
+**After:**
+```python
+from mindnlp.core import optim
+```
 
 ## Usage
 
 1. Run the application:
 
-   ```bash
-   # complex version
-   python simple_ui.py
-   # complex version
-   python complex_ui.py
-   ```
+```bash
+# Simple version
+python simple_ui.py
+
+# Complex version with PDF preview
+python complex_ui.py
+```
+
+2. Access the application in your web browser at `http://localhost:8082`
+
+3. Upload a PDF file using the "Upload PDF" button
+
+4. Ask questions in the chatbox
+
+## API Usage Example
+
+```python
+import sys
+sys.path.insert(0, 'ChatPDF')
+from chatpdf import ChatPDF
+
+# Initialize ChatPDF with default settings
+chatpdf = ChatPDF(
+    generate_model_name_or_path="01ai/Yi-6B-Chat",
+    corpus_files="sample.pdf",
+    chunk_size=250
+)
+
+# Ask a question
+response, references = chatpdf.predict("What is the main topic of this paper?")
+print(response)
+```
+
+## Directory Structure
+
+```
+14.chatpdf/
+├── ChatPDF/
+│   ├── chatpdf.py           # Main ChatPDF class
+│   ├── logic.py             # UI interaction logic
+│   ├── simple_ui.py         # Simple Gradio UI
+│   ├── complex_ui.py        # Advanced UI with PDF preview
+│   ├── requirements.txt     # Dependency specifications
+│   ├── requirements_locked.txt  # Version-locked dependencies
+│   ├── sample.pdf           # Example PDF file
+│   └── README.md            # This file
+├── install.bat              # Windows installation script
+├── install.sh               # Linux/Mac installation script
+└── test_full_functionality.py  # Test suite
+```
+
+## Troubleshooting
+
+### Common Issues
+
+1. **ImportError: No module named 'winfcntlock'**
+   - This is an optional dependency of msimilarities for file locking on Windows
+   - The application will work normally without it
+   - Solution: Ignore this warning
+
+2. **Memory Issues with Large Models**
+   - Reduce `chunk_size` parameter (e.g., 100-150)
+   - Use a smaller model (e.g., use a 6B model instead of 34B)
+   - Ensure sufficient RAM available
+
+3. **Model Download Issues**
+   - Check network connection
+   - Try using a different mirror (ModelScope/HuggingFace)
+   - Configure proxy if needed
+
+### Performance Tips
+
+- Use GPU acceleration if available (install CUDA version of MindSpore)
+- Reduce `similarity_top_k` and `rerank_top_k` for faster responses
+- Use batch processing for multiple queries
+
+## License
 
-2. Access the application in your web browser as specified in the console.
+MIT License - See LICENSE file for details.
 
-3. To preview a PDF file, click the "Upload PDF" button and select the PDF file from your local machine. The application will display a preview of the PDF file.
+## References
 
-5. Use the chatbox to ask questions or have a conversation with the chatbot. The chatbot will generate responses based on the input.
+- [MindSpore](https://www.mindspore.cn/)
+- [MindNLP](https://github.com/mindspore-lab/mindnlp)
+- [Step into LLM Course](https://github.com/mindspore-courses/step_into_llm)
+- [PR #1519 - GPT Summarization Fix](https://github.com/mindspore-lab/mindnlp/pull/1519)
diff --git a/Season2.step_into_llm/14.chatpdf/ChatPDF/chatpdf.py b/Season2.step_into_llm/14.chatpdf/ChatPDF/chatpdf.py
@@ -122,7 +122,7 @@ def __init__(
             self,
             similarity_model: SimilarityABC = None,
             generate_model_type: str = "auto",
-            generate_model_name_or_path: str = "01ai/Yi-6B-Chat",
+            generate_model_name_or_path: str = "01-ai/Yi-6B-Chat",
             lora_model_name_or_path: str = None,
             corpus_files: Union[str, List[str]] = None,
             save_corpus_emb_dir: str = "./corpus_embs/",
@@ -183,7 +183,7 @@ def __init__(
         if rerank_model_name_or_path:
             self.rerank_tokenizer = AutoTokenizer.from_pretrained(rerank_model_name_or_path, mirror='modelscope')
             self.rerank_model = AutoModelForSequenceClassification.from_pretrained(rerank_model_name_or_path, mirror='modelscope')
-            self.rerank_model.set_train(False)
+            self.rerank_model.set_train(mode=False)
         else:
             self.rerank_model = None
             self.rerank_tokenizer = None
@@ -219,7 +219,7 @@ def _init_gen_model(
                 peft_name,
             )
             logger.info(f"Loaded peft model from {peft_name}")
-        model.set_train(False)
+        model.set_train(mode=False)
         return model, tokenizer
 
     def _get_chat_input(self):

diff --git a/Season2.step_into_llm/14.chatpdf/ChatPDF/logic.py b/Season2.step_into_llm/14.chatpdf/ChatPDF/logic.py
@@ -3,7 +3,16 @@
 import gradio as gr
 from chatpdf import ChatPDF
 
-model = ChatPDF()
+# 使用延迟加载模式，避免在导入时加载模型
+_model = None
+
+def _get_model():
+    """延迟加载模型"""
+    global _model
+    if _model is None:
+        _model = ChatPDF()
+    return _model
+
 # Function to add text to the chat history
 def add_text(history, text):
     """
@@ -23,6 +32,7 @@ def add_text(history, text):
 
 
 def predict_stream(message, history):
+    model = _get_model()
     history_format = []
     for human, assistant in history:
         history_format.append([human, assistant])
@@ -46,6 +56,7 @@ def generate_response(history, query, btn):
     if not btn:
         raise gr.Error(message='Upload a PDF')
 
+    model = _get_model()
     history_format = []
     for human, assistant in history:
         history_format.append([human, assistant])
@@ -66,6 +77,7 @@ def render_file(file):
         PIL.Image.Image: The rendered page as an image.
     """
     # global n
+    model = _get_model()
     model.reset_corpus(file)
     doc = fitz.open(file.name)
     page = doc[0]

diff --git a/Season2.step_into_llm/14.chatpdf/ChatPDF/requirements.txt b/Season2.step_into_llm/14.chatpdf/ChatPDF/requirements.txt
@@ -1,9 +1,30 @@
-mindspore
-mindnlp
-PyMuPDF
-ms2vec
-msimilarities
-loguru
-jieba
-gradio
-PyPDF2
+# MindSpore & MindNLP 核心依赖（必须锁定版本）
+mindspore==2.6.0
+mindnlp==0.4.1
+
+# PDF处理
+PyMuPDF>=1.26.0
+PyPDF2>=3.0.0
+
+# 相似度检索（基于MindSpore）
+ms2vec>=0.0.2
+msimilarities>=0.0.2
+
+# 日志和中文处理
+loguru>=0.7.0
+jieba>=0.42.1
+
+# Web UI
+gradio>=4.0.0
+
+# 通用依赖（建议版本）
+# numpy<2.0.0,>=1.20.0  # 由mindspore自动安装
+# tokenizers==0.19.1    # 由mindnlp自动安装
+# sentencepiece>=0.1.99 # 由mindnlp自动安装
+# protobuf>=3.13.0      # 由mindspore自动安装
+# scipy>=1.5.4          # 由mindspore自动安装
+# pillow>=10.0.0        # 由mindspore自动安装
+
+# 可选：用于开发和测试
+# pytest>=7.0.0
+# ipython>=8.0.0
diff --git a/Season2.step_into_llm/14.chatpdf/ChatPDF/simple_ui.py b/Season2.step_into_llm/14.chatpdf/ChatPDF/simple_ui.py
@@ -68,20 +68,17 @@ def predict(message, history):
         avatar_images=(
             os.path.join(pwd_path, "assets/user.png"),
             os.path.join(pwd_path, "assets/llama.png"),
-        ), bubble_full_width=False)
+        ))
     title = " 🎉ChatPDF WebUI🎉 "
     description = "Link in Github: [lvyufeng/ChatPDF](https://github.com/lvyufeng/ChatPDF)"
-    css = """.toast-wrap { display: none !important } """
     examples = ['Can you tell me about the NLP?', '介绍下NLP']
     chat_interface_stream = gr.ChatInterface(
         predict_stream,
         textbox=gr.Textbox(lines=4, placeholder="Ask me question", scale=7),
         title=title,
         description=description,
         chatbot=chatbot_stream,
-        css=css,
         examples=examples,
-        theme='soft',
     )
 
     with gr.Blocks() as demo: