Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

网页返回编码错误 #264

Open
ruikai0103 opened this issue Sep 5, 2024 · 1 comment
Open

网页返回编码错误 #264

ruikai0103 opened this issue Sep 5, 2024 · 1 comment

Comments

@ruikai0103
Copy link

需知

升级feapder,保证feapder是最新版,若BUG仍然存在,则详细描述问题

pip install --upgrade feapder

问题
在使用feapder请求网址,https://www.bookschina.com/8342179.htm 的时候 用requests请求返回的数据是正常的 但是使用feapder请求的网页数据 字符串部分就是乱码 并且 在请求的时候使用了参数 auto_request=False 然后在回调中手动用requests请求,返回的数据是正常的,但是使用 response = feapder.Response(response) 把Response转换之后 字符串就开始乱码。
已经尝试过吧 resposen.code = "utf-8" 和 gb231 都是不可以的。
截图
image

代码

headers = {
    "Accept": "text/html,application/xhtml+xml,application/xml;q=0.9,image/avif,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3;q=0.7",
    "Accept-Language": "zh-CN,zh;q=0.9",
    "Cache-Control": "no-cache",
    "Connection": "keep-alive",
    "Pragma": "no-cache",
    "Sec-Fetch-Dest": "document",
    "Sec-Fetch-Mode": "navigate",
    "Sec-Fetch-Site": "none",
    "Sec-Fetch-User": "?1",
    "Upgrade-Insecure-Requests": "1",
    "User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/124.0.0.0 Safari/537.36",
    "sec-ch-ua": "\"Chromium\";v=\"124\", \"Google Chrome\";v=\"124\", \"Not-A.Brand\";v=\"99\"",
    "sec-ch-ua-mobile": "?0",
    "sec-ch-ua-platform": "\"Windows\""
}


class AirSpiderDemo(feapder.AirSpider):
    def start_requests(self):
        url = "https://www.bookschina.com/8342179.htm"
        yield feapder.Request(url, method="GET", auto_request=False)

    def download_midware(self, request):
        request.headers = headers
        request.proxies = sui_dao_proxies()
        request.cookies = {
            # "BookUser": "1%7c2e9892dc-7c4f-47d1-95f0-2ebd328c90bf%7c1%7c0%7c638620898507693730%7c20180722%7c337457b7db499919",
            # "UserSign": "069f073dff21b10b",
            # "ASP.NET_SessionId": "rrwxo4jepzlcbw5yy0h2jw4y",
            # "UserUnionId": "de943031-e334-4f0b-8d5c-907cfd37b467",
            # "booklisthistory": "8342179,7733959,8300491,7438304,9103303,6909214,7156650,6900090,8898194,8989529"
        }
        return request

    def parse(self, request, response):

        response = requests.get(request.url, proxies=sui_dao_proxies())
        print(response.text)
        response = feapder.Response(response)
        # response.encoding_errors = 'replace'
        title = response.xpath("//h1/text()").extract_first()
        # print(response.text)
        print(response)
        print(title)


if __name__ == "__main__":
    AirSpiderDemo(thread_count=1).start()
@ruikai0103
Copy link
Author

临时解决办法

response1 = requests.get(request.url, proxies=sui_dao_proxies())
response = feapder.Response(response1)
response.text = response1.text

发现乱码的时候手动 feapder的Response替换掉。

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant