Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

关于解决data_collection中的乱码问题(应该和python版本有关) #46

Open
TreeECNU opened this issue Dec 6, 2023 · 2 comments

Comments

@TreeECNU
Copy link

TreeECNU commented Dec 6, 2023

如果在data_collection中运行print("标题:", title.text)时,出现中文乱码的情况(可能是和python版本有关,我的版本是3.11.6),可以尝试把代码改成如下格式:

import requests
# 发送GET请求
response = requests.get("https://baidu.com")
# 获取网页内容
html_content = response.content  // 把原来的response.text改成response.content
# 打印网页内容
print(html_content)
from bs4 import BeautifulSoup
# 使用Beautiful Soup解析HTML
soup = BeautifulSoup(html_content, 'html.parser') 
# 查找特定标签
title = soup.title
print("标题:", title.text)

这样应该就正常了

@TreeECNU
Copy link
Author

TreeECNU commented Dec 6, 2023

不过突然发现我下载的文件和助教发的文件内容有部分修改,按照助教在issue中发的那个文件来做是没有问题的。

@thirstylearning
Copy link

看到你这里的修改应该是将response.text改成response.content,二者之间在编码格式上是有区别的,应该是不同的编码格式导致的乱码问题

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants