Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

【Discuz】 Cannot read properties of undefined (reading 'content-type') 求测试 修复方向:可以直接使用 got 返回的数据,而不需要使用 iconv 对数据进行解码 #15765

Open
1 task done
KwToPA opened this issue May 30, 2024 · 12 comments
Labels
RSS bug Something isn't working

Comments

@KwToPA
Copy link

KwToPA commented May 30, 2024

路由地址

/discuz/:ver{[7x]}/:link{.+}

完整路由地址

/discuz/:ver{[7x]}/:cid{[0-9]{2}}/:link{.+},/:ver{[7x]}/:link{.+},/:link{.+}

相关文档

https://docs.rsshub.app/routes/other#discuz

预期是什么?

正常推送

实际发生了什么?

挂了很久了

部署

RSSHub 演示 (https://rsshub.app)

部署相关信息

No response

额外信息

Error Message:
TypeError: Cannot read properties of undefined (reading 'content-type')

Route: /discuz/:ver{[7x]}/:link{.+}

Full Route: /discuz/x/https%3a%2f%2fbbs.shuiguobang.com%2fforum.php%3Fmod%3Dforumdisplay%26fid%3D396%26filter%3Dauthor%26orderby%3Ddateline

Node Version: v21.7.3

Git Hash: 2d1a0b7

这不是重复的 issue

  • 我已经搜索了 现有 issue,以确保该错误尚未被报告。
@KwToPA KwToPA added the RSS bug Something isn't working label May 30, 2024
Copy link
Contributor

Searching for maintainers:
  • /discuz/:ver{[7x]}/:link{.+}: No maintainer listed, possibly a v1 or misconfigured route

To maintainers: if you are not willing to be disturbed, list your username in scripts/workflow/test-issue/call-maintainer.js. In this way, your username will be wrapped in an inline code block when tagged so you will not be notified.

If all routes can not be found, the issue will be closed automatically. Please use NOROUTE for a route-irrelevant issue or leave a comment if it is a mistake.
如果所有路由都无法匹配,issue 将会被自动关闭。如果 issue 和路由无关,请使用 NOROUTE 关键词,或者留下评论。我们会重新审核。

@KwToPA
Copy link
Author

KwToPA commented Jun 2, 2024

请问是否因为这个原因? #15045 (comment)

@KwToPA
Copy link
Author

KwToPA commented Jun 2, 2024

感觉可以参考这个 6a89b50

fix(route): /163/dy2/:id, remove iconv module, use data from got directly instead of decoding data with content-type

@KwToPA
Copy link
Author

KwToPA commented Jun 2, 2024

https://raw.githubusercontent.com/DIYgod/RSSHub/master/lib/routes/discuz/discuz.ts

这边也有 iconv

import iconv from 'iconv-lite';

const responseData = iconv.decode(response.data, charset ?? 'utf-8');
if (!responseData) {
    const description = '获取详细内容失败';
    return { description };
}

// 若没有指定编码,则默认utf-8
const contentType = response.headers['content-type'] || '';
let $ = load(iconv.decode(responseData, 'utf-8'));
const charset = contentType.match(/charset=([^;]*)/)?.[1] ?? $('meta[charset]').attr('charset') ?? $('meta[http-equiv="Content-Type"]').attr('content')?.split('charset=')?.[1];
if (charset?.toLowerCase() !== 'utf-8') {
    $ = load(iconv.decode(responseData, charset ?? 'utf-8'));
}

@KwToPA
Copy link
Author

KwToPA commented Jun 2, 2024

人工智能反馈给我 根据你提供的修改策略,我们可以直接使用 got 返回的数据,而不需要使用 iconv 对数据进行解码。这是修改后的 discuz.ts 文件的完整代码:

import { Route } from '@/types';
import cache from '@/utils/cache';
import got from '@/utils/got'; // 使用 got 而不是 ofetch
import { load } from 'cheerio';
import { parseDate } from '@/utils/parse-date';
import { config } from '@/config';
import ConfigNotFoundError from '@/errors/types/config-not-found';
import InvalidParameterError from '@/errors/types/invalid-parameter';

function fixUrl(itemLink, baseUrl) {
    // 处理相对链接
    if (itemLink) {
        if (baseUrl && !/^https?:\/\//.test(baseUrl)) {
            baseUrl = /^\/\//.test(baseUrl) ? 'http:' + baseUrl : 'http://' + baseUrl;
        }
        itemLink = new URL(itemLink, baseUrl).href;
    }
    return itemLink;
}

// discuz 7.x 与 discuz x系列 通用文章内容抓取
async function loadContent(itemLink, header) {
    const response = await got({
        method: 'get',
        url: itemLink,
        responseType: 'text', // 修改这里,使用 'text' 而不是 'buffer'
        headers: header,
    });

    const responseData = response.body; // 修改这里,使用 response.body 而不是 response.data
    const $ = load(responseData);

    const post = $('div#postlist div[id^=post] td[id^=postmessage]').first();

    // fix lazyload image
    post.find('img').each((_, img) => {
        img = $(img);
        if (img.attr('src')?.endsWith('none.gif') && img.attr('file')) {
            img.attr('src', img.attr('file') || img.attr('zoomfile'));
            img.removeAttr('file');
            img.removeAttr('zoomfile');
        }
    });

    // 只抓取论坛1楼消息
    const description = post.html();

    return { description };
}

export const route: Route = {
    path: ['/:ver{[7x]}/:cid{[0-9]{2}}/:link{.+}', '/:ver{[7x]}/:link{.+}', '/:link{.+}'],
    name: 'Unknown',
    maintainers: [],
    handler,
};

async function handler(ctx) {
    let link = ctx.req.param('link');
    const ver = ctx.req.param('ver') ? ctx.req.param('ver').toUpperCase() : undefined;
    const cid = ctx.req.param('cid');
    link = link.replace(/:\/\//, ':/').replace(/:\//, '://');

    const cookie = cid === undefined ? '' : config.discuz.cookies[cid];
    if (cookie === undefined) {
        throw new ConfigNotFoundError('缺少对应论坛的cookie.');
    }

    const header = {
        Cookie: cookie,
    };

    const response = await got({
        method: 'get',
        url: link,
        responseType: 'text', // 修改这里,使用 'text' 而不是 'buffer'
        headers: header,
    });

    const responseData = response.body; // 修改这里,使用 response.body 而不是 response.data
    const $ = load(responseData);

    const version = ver ? `DISCUZ! ${ver}` : $('head > meta[name=generator]').attr('content');

    let items;
    if (version.toUpperCase().startsWith('DISCUZ! 7')) {
        // discuz 7.x 系列
        // 支持全文抓取,限制抓取页面5个
        const list = $('tbody[id^="normalthread"] > tr')
            .slice(0, ctx.req.query('limit') ? Number.parseInt(ctx.req.query('limit'), 10) : 5)
            .toArray()
            .map((item) => {
                item = $(item);
                const a = item.find('span[id^=thread] a');
                return {
                    title: a.text().trim(),
                    link: fixUrl(a.attr('href'), link),
                    pubDate: item.find('td.author em').length ? parseDate(item.find('td.author em').text().trim()) : undefined,
                    author: item.find('td.author cite a').text().trim(),
                };
            });

        items = await Promise.all(
            list.map((item) =>
                cache.tryGet(item.link, async () => {
                    const { description } = await loadContent(item.link, header);

                    item.description = description;
                    return item;
                })
            )
        );
    } else if (version.toUpperCase().startsWith('DISCUZ! X')) {
        // discuz X 系列
        // 支持全文抓取,限制抓取页面5个
        const list = $('tbody[id^="normalthread"] > tr')
            .slice(0, ctx.req.query('limit') ? Number.parseInt(ctx.req.query('limit'), 10) : 5)
            .toArray()
            .map((item) => {
                item = $(item);
                const a = item.find('a.xst');
                return {
                    title: a.text(),
                    link: fixUrl(a.attr('href'), link),
                    pubDate: item.find('td.by:nth-child(3) em span').last().length ? parseDate(item.find('td.by:nth-child(3) em span').last().text().trim()) : undefined,
                    author: item.find('td.by:nth-child(3) cite a').text().trim(),
                };
            });

        items = await Promise.all(
            list.map((item) =>
                cache.tryGet(item.link, async () => {
                    const { description } = await loadContent(item.link, header);

                    item.description = description;
                    return item;
                })
            )
        );
    } else {
        throw new InvalidParameterError('不支持当前Discuz版本.');
    }

    return {
        title: $('head > title').text(),
        description: $('head > meta[name=description]').attr('content'),
        link,
        item: items,
    };
}

@KwToPA
Copy link
Author

KwToPA commented Jun 2, 2024

你好,能否帮忙测试下上面的代码,如果能正常工作能否合并? 感谢

@TonyRL

@KwToPA KwToPA changed the title 【Discuz】TypeError: Cannot read properties of undefined (reading 'content-type') 【Discuz】 Cannot read properties of undefined (reading 'content-type') 求测试 可以直接使用 got 返回的数据,而不需要使用 iconv 对数据进行解码 Jun 2, 2024
Copy link
Contributor

github-actions bot commented Jun 2, 2024

Searching for maintainers:
  • /discuz/:ver{[7x]}/:link{.+}: No maintainer listed, possibly a v1 or misconfigured route

To maintainers: if you are not willing to be disturbed, list your username in scripts/workflow/test-issue/call-maintainer.js. In this way, your username will be wrapped in an inline code block when tagged so you will not be notified.

If all routes can not be found, the issue will be closed automatically. Please use NOROUTE for a route-irrelevant issue or leave a comment if it is a mistake.
如果所有路由都无法匹配,issue 将会被自动关闭。如果 issue 和路由无关,请使用 NOROUTE 关键词,或者留下评论。我们会重新审核。

@KwToPA KwToPA changed the title 【Discuz】 Cannot read properties of undefined (reading 'content-type') 求测试 可以直接使用 got 返回的数据,而不需要使用 iconv 对数据进行解码 【Discuz】 Cannot read properties of undefined (reading 'content-type') 求测试 修复方向:可以直接使用 got 返回的数据,而不需要使用 iconv 对数据进行解码 Jun 2, 2024
Copy link
Contributor

github-actions bot commented Jun 2, 2024

Searching for maintainers:
  • /discuz/:ver{[7x]}/:link{.+}: No maintainer listed, possibly a v1 or misconfigured route

To maintainers: if you are not willing to be disturbed, list your username in scripts/workflow/test-issue/call-maintainer.js. In this way, your username will be wrapped in an inline code block when tagged so you will not be notified.

If all routes can not be found, the issue will be closed automatically. Please use NOROUTE for a route-irrelevant issue or leave a comment if it is a mistake.
如果所有路由都无法匹配,issue 将会被自动关闭。如果 issue 和路由无关,请使用 NOROUTE 关键词,或者留下评论。我们会重新审核。

@HNGHTNLP
Copy link

HNGHTNLP commented Jun 8, 2024

@junfengP 路由应该是他做的

@KwToPA
Copy link
Author

KwToPA commented Jun 9, 2024

@junfengP 路由应该是他做的

完蛋了,这位老哥最近6个月只开了两个问题

请问这种情况能否重新申请一个 rss 需求? @TonyRL

@junfengP
Copy link
Contributor

const responseData = response.body; // 修改这里,使用 response.body 而不是 response.data

看起来可以这么改造,你可以提一个PR来修复这个路由。
这么写是因为之前的got函数返回的内容中文乱码,现在如果中文网站能正常解码,则没有问题

@KwToPA
Copy link
Author

KwToPA commented Jun 17, 2024

const responseData = response.body; // 修改这里,使用 response.body 而不是 response.data

看起来可以这么改造,你可以提一个PR来修复这个路由。 这么写是因为之前的got函数返回的内容中文乱码,现在如果中文网站能正常解码,则没有问题

你好 能否代为提交代码?

import got from '@/utils/got';

这个got是什么函数? 能否在浏览器的控制台用这个got来测试?

got 是一个用于 HTTP 请求的 JavaScript 库,它提供了简单、人性化、强大的 API,可以用于处理各种类型的 HTTP 请求。这个库在 Node.js 环境中运行,因此不能直接在浏览器的控制台中使用。

感谢大哥

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
RSS bug Something isn't working
Projects
None yet
Development

No branches or pull requests

3 participants