关于解决data_collection中的乱码问题（应该和python版本有关） #46

TreeECNU · 2023-12-06T11:15:54Z

如果在data_collection中运行print("标题:", title.text)时，出现中文乱码的情况（可能是和python版本有关，我的版本是3.11.6），可以尝试把代码改成如下格式：

import requests
# 发送GET请求
response = requests.get("https://baidu.com")
# 获取网页内容
html_content = response.content  // 把原来的response.text改成response.content
# 打印网页内容
print(html_content)
from bs4 import BeautifulSoup
# 使用Beautiful Soup解析HTML
soup = BeautifulSoup(html_content, 'html.parser') 
# 查找特定标签
title = soup.title
print("标题:", title.text)

这样应该就正常了

TreeECNU · 2023-12-06T11:35:48Z

不过突然发现我下载的文件和助教发的文件内容有部分修改，按照助教在issue中发的那个文件来做是没有问题的。

thirstylearning · 2024-01-14T07:14:37Z

看到你这里的修改应该是将response.text改成response.content，二者之间在编码格式上是有区别的，应该是不同的编码格式导致的乱码问题

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

关于解决data_collection中的乱码问题（应该和python版本有关） #46

关于解决data_collection中的乱码问题（应该和python版本有关） #46

TreeECNU commented Dec 6, 2023 •

edited

Loading

TreeECNU commented Dec 6, 2023

thirstylearning commented Jan 14, 2024

关于解决data_collection中的乱码问题（应该和python版本有关） #46

关于解决data_collection中的乱码问题（应该和python版本有关） #46

Comments

TreeECNU commented Dec 6, 2023 • edited Loading

TreeECNU commented Dec 6, 2023

thirstylearning commented Jan 14, 2024

TreeECNU commented Dec 6, 2023 •

edited

Loading