Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

image rendering times out sporadically #582

Open
jluebbe opened this issue Nov 20, 2024 · 12 comments
Open

image rendering times out sporadically #582

jluebbe opened this issue Nov 20, 2024 · 12 comments
Labels
type/bug Something isn't working

Comments

@jluebbe
Copy link

jluebbe commented Nov 20, 2024

What happened:
After updating grafana (to 11.3.1) and the grafana-image-renderer (to 3.11.6), some image rendering requests fail with a timeout, while most work.
As far as I can tell it affects ~25% of requests.

The log message from the renderer is:

logger=plugin.grafana-image-renderer t=2024-11-20T20:22:04.460896286+01:00 level=debug msg="Render request received" url="http://127.0.0.1:3000/d-solo/zglP5XSWk/node?panelId=4&var-node=74acb92689a8&from=now-1d&width=650&height=350&theme=light&render=1"
logger=plugin.grafana-image-renderer t=2024-11-20T20:22:05.585531541+01:00 level=error msg="Browser request failed" url="http://127.0.0.1:3000/api/ds/query?ds_type=influxdb&requestId=SQR100" method=POST failure=net::ERR_ABORTED
logger=plugin.grafana-image-renderer t=2024-11-20T20:22:34.507991251+01:00 level=error msg="Error while waiting for the panels to load" url="http://127.0.0.1:3000/d-solo/zglP5XSWk/node?panelId=4&var-node=74acb92689a8&from=now-1d&width=650&height=350&theme=light&render=1" err="TargetCloseError: Protocol error (Runtime.callFunctionOn): Target closed
    at CallbackRegistry.clear (/snapshot/src/node_modules/puppeteer-core/lib/cjs/puppeteer/common/CallbackRegistry.js:72:36)
    at CdpCDPSession._onClosed (/snapshot/src/node_modules/puppeteer-core/lib/cjs/puppeteer/cdp/CDPSession.js:101:25)
    at Connection.onMessage (/snapshot/src/node_modules/puppeteer-core/lib/cjs/puppeteer/cdp/Connection.js:130:25)
    at WebSocket.<anonymous> (/snapshot/src/node_modules/puppeteer-core/lib/cjs/puppeteer/node/NodeWebSocketTransport.js:44:32)
    at callListener (/snapshot/src/node_modules/ws/lib/event-target.js:290:14)
    at WebSocket.onMessage (/snapshot/src/node_modules/ws/lib/event-target.js:209:9)
    at WebSocket.emit (node:events:537:28)
    at Receiver.receiverOnMessage (/snapshot/src/node_modules/ws/lib/websocket.js:1220:20)
    at Receiver.emit (node:events:537:28)
    at Immediate.<anonymous> (/snapshot/src/node_modules/ws/lib/receiver.js:601:16)"
logger=plugin.grafana-image-renderer t=2024-11-20T20:22:34.508769665+01:00 level=error msg="Render request failed" url="http://127.0.0.1:3000/d-solo/zglP5XSWk/node?panelId=4&var-node=74acb92689a8&from=now-1d&width=650&height=350&theme=light&render=1" error="Error: Timeout hit: 30000"

The influx log doesn't show anything obviously problematic.

What you expected to happen:
All requests render correctly.

How to reproduce it (as minimally and precisely as possible):
The images are used on public pages: https://freifunk-bs.de/map/#!/en/map/68725132e5cb

Anything else we need to know?:
In the error case, the renderer responds with a truncated error message instead of the image:

<!DOCTYPE html>
<html lang="en">
  <head>
    <meta charset="utf-8" />
    <meta http-equiv="X-UA-Compatible" content="IE=edge,chrome=1" />
    <meta name="viewport" content="width=device-width" />
    <meta name="theme-color" content="#000" />

    <title>Grafana - Error</title>

    <base href="/grafana/" />

    
    <link rel="stylesheet" href="

Environment:

  • Grafana Image Renderer version: 3.11.6
  • Grafana version: 11.3.1
  • Installed plugin or remote renderer service: installed as plugin
  • OS Grafana Image Renderer is installed on: Debian 12.8
  • User OS & Browser: Debian Chromium, Andorid, curl
  • Others:
@mikerevellesciplay
Copy link

mikerevellesciplay commented Nov 25, 2024

had the same issue with 11.3.1, spent more than a week troubleshooting this and it looks like downgrading to 11.2.4 fixed it. directly rendering from the grafana interface (panel -> share -> generate rendered image) works fine but rendering on alert is broken.

@manny4566
Copy link

I can confirm the problem

@KausL
Copy link

KausL commented Dec 4, 2024

I can confirm this issue:

  • With Grafana v10 I use the rendering with a link like this

https://10.10.20.7:3000/render/d-solo/dtfndb010/10-system-performance-24h-overview/?panelId=34&width=260&height=260&from=1733237940000&to=1733324340000

The image generation takes around 5s depending on the panel.

  • With Grafana 11.3.1 I use the same url is used but now the image generation takes around 30s which expands the generation of my report email from 5min to 30min which is not really critical but very strange.

After playing around with the renderer settings I found out that using the direct link button via the grafana UI this is wotking as fast as under v10. So the difference is that the link callled by the button contains all the vars defined in the dashboard (no matter if it is used in the panel or not). The Link looks like this now:

https://10.10.20.7:3000/render/d-solo/dtfndb010?orgId=1&from=2024-12-03T15:10:40.882Z&to=2024-12-04T15:10:40.882Z&var-sitea=BRE&var-siteb=HAM&var-sitec=KLN&var-maxOpsRDS=99600&var-maxOpsBDS=28800&var-maxQryBDS=17400&var-maxUpdBDS=3800&var-srvCountBDS=6&var-srvCountRDS=6&var-srvCountBDSOneSite=2&refresh=1m&panelId=panel-34&__feature.dashboardSceneSolo&width=1000&height=500&tz=Europe%2FBerlin

Can anybody explain how this issues/bug can be solved/switched back to the old behaviour?

Thanks in advance!

@mikerevellesciplay
Copy link

@evictorero @lucychen-grafana do you guys test this stuff?

@KausL
Copy link

KausL commented Dec 5, 2024

I can also confirm that the issue is solved by downgrading to grafana 11.2.4 as mentioned by @mikerevellesciplay .

@github-project-automation github-project-automation bot moved this to 🗂️ Needs Triage / Escalation in Sharing squad Dec 9, 2024
@jluebbe
Copy link
Author

jluebbe commented Dec 11, 2024

had the same issue with 11.3.1, spent more than a week troubleshooting this and it looks like downgrading to 11.2.4 fixed it.

Thanks for the hint, @mikerevellesciplay. I've downgraded Grafana to 11.2.4 and now rendering is reliable.

@Andrew-Hinson
Copy link

I'm also dealing with this issue, I have an app that "recreates" grafana dashboards for posting to slack, notably, only panels that query snowflake are having issues, likely due to some small lag as the queries are fairly quick. Before we upgraded Grafana, no issues. Now, I routinely get blank panels sent to slack. Downgrading is not an option for us.

@evictorero
Copy link
Contributor

Hi everyone. Thanks for reporting this. We are trying to triage this issue and isolate this case to be able to reproduce it.
As far as I can tell, all of you are having issues after upgrading Grafana to 11.3.1 and consuming the render API from an external app using the same link you used in older versions. We suspect this problem could be related to a retro compatibility issue.

Can you help us answering these questions?

  • When does it start failing? e.g After upgrading to 11.3.1 from 11.2.0
  • Which image renderer version are you using? 
  • Is the image renderer configured as a plugin or as an external service?
  • Which data sources use the panels that are failing?
  • Does this issue happen randomly or can you reproduce it consistently?
  • If you grab the URL from the panel UI in Grafana 11.3.1. Does it fail?
  • Does adding the query param __feature.dashboardSceneSolo to the old URL make it work?

@manny4566
Copy link

manny4566 commented Dec 17, 2024

Hi everyone. Thanks for reporting this. We are trying to triage this issue and isolate this case to be able to reproduce it. As far as I can tell, all of you are having issues after upgrading Grafana to 11.3.1 and consuming the render API from an external app using the same link you used in older versions. We suspect this problem could be related to a retro compatibility issue.

Can you help us answering these questions?

  • When does it start failing? e.g After upgrading to 11.3.1 from 11.2.0
  • Which image renderer version are you using?
  • Is the image renderer configured as a plugin or as an external service?
  • Which data sources use the panels that are failing?
  • Does this issue happen randomly or can you reproduce it consistently?
  • If you grab the URL from the panel UI in Grafana 11.3.1. Does it fail?
  • Does adding the query param __feature.dashboardSceneSolo to the old URL make it work?

Can you help us answering these questions?

When does it start failing?
e.g After upgrading to 11.3.1 from 11.2.0 -> on of the last updates grafana-11.3.0-1.x86_64 or grafana-11.3.1-1.x86_64. Bevor 11.3.0 i had grafana-11.2.2-1.x86_64

Which image renderer version are you using?
3.11.6

Is the image renderer configured as a plugin or as an external service?
Plugin on the Grafana Host

Which data sources use the panels that are failing?
InfluxDBv2

Does this issue happen randomly or can you reproduce it consistently?
fairly regularly if not always

If you grab the URL from the panel UI in Grafana 11.3.1. Does it fail?
It run into the same timeout error
Does adding the query param __feature.dashboardSceneSolo to the old URL make it work? no. example URL:
http://HOST:3000/d-solo/icinga2/icinga2-with-influxdb?from=now-3h&height=480&orgId=1&panelId=3&render=1&theme=dark&timezone=browser&to=now&var-command=load&var-disk=%24__all&var-hostname=XXX&var-service=Load+1h&var-threadcount=%24__all&width=1000&__feature.dashboardSceneSolo=false

Also not without "&__feature.dashboardSceneSolo=false"

I hope this helps.
Thank you

@manny4566
Copy link

Hi,
i think i'm getting closer to the problem.

The problem is the seems to be the panal id. With 11.3.0 the variable for the panal id changed:

v11.2.2
https://XXX/d/icinga2/icinga2-with-influxdb?var-hostname=XXX&var-service=Load%201h&var-command=load&from=now-3h&to=now&orgId=1&viewPanel=3

Here it is &viewPanel=3

v11.3.0
https://XXX/d/icinga2/icinga2-with-influxdb?var-hostname=XXX&var-service=Load%201h&var-command=load&from=now-3h&to=now&orgId=1&viewPanel=panel-3&timezone=browser&var-disk=$__all&var-threadcount=$__all

Here it is &viewPanel=panel-3.

Futhermore automatically "&timezone=browser&var-disk=$__all&var-threadcount=$__all" is added for variable without a definition ($__all).
In 11.2.2 they are not added automatically to the URL

##################
As you can see, it is an Icinga2 implementation. The url is build by Icinga2, concretely the Icinga2 grafana plugin. I think there is a discrepancy between the Icinga2 module and the Grafana panal id variable, which has changed in 11.3.0.
Possibly also in interaction with the image renderer plugin.
I use this environment:

Icinga2 2.14.3
Grafana Plugin v2.0.3 - https://github.com/Mikesch-mp/icingaweb2-module-grafana
same Problem with v3.0.0 - https://github.com/NETWAYS/icingaweb2-module-grafana

Grafana 11.3.2 ( But also with v11.3.0 ). With v11.2.2 it works.
grafana-image-renderer v3.11.6

I hope this will help.
I will also add this comment to other Github tickets.

@Pedritod
Copy link

We are also facing this issue, by doing the downgrade it works again. Thank you for the nice explanation and investigation of the bug

@okossuth
Copy link

Same here, i did some tests and there were issues rendering some panels using grafana 11.4.0 and grafana-image-renderer 3.11.6 running as external service. Once i downgraded grafana to 11.2.5, rendering panels worked 100% of the time.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
type/bug Something isn't working
Projects
Status: 🗂️ Needs Triage / Escalation
Development

No branches or pull requests

8 participants