-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Cloudstack 4.20 UI cannot be operated, all timeouts. Restart the server to recover, it happens every 10 days. There are a lot of duplicate logs in the log, and one record exceeds 3MB #10578
Comments
Thanks for opening your first issue here! Be sure to follow the issue template! |
@ztskycn , you will probably find a reason above the log you are refering to that would explain the issue. |
with upgrade to the log4j 2.x (#7131) , the content of all API responses are logged. this issue is caused by massive response of API listApis. I am not sure if it is configurable. |
@weizhouapache, yes, I believe #10567 is handling that (cc @DaanHoogland, @gpordeus) |
thanks @bernardodemarco I did not look into the PR. |
Yes, @weizhouapache @ztskycn this was changed (by mistake) with the #7131 PR, originally, the API response logs were being written to a specific logger/appender/file. #10567 should fix this so that it goes back to what it used to be by default. One workaround to limit the size of your log lines is to set the
Using this configuration, each log line would have at most 1000 characters. This number should be tweaked so that you can still read important log information but those enormous JSON logs are truncated. See https://logging.apache.org/log4j/2.x/manual/pattern-layout.html#converter-max-len for more information on how to configure your logs. In any case, is your server going down because of no disk space left? If not, I fail to see how the log size would affect this. |
I still can't find the specific problem. Is there any way to troubleshoot this problem? If the log is too large, why is there such a large log? |
As @weizhouapache and I explained, these large logs should not be printed into this file, this was a bug introduced in version 4.20.0.0. Version 4.21.0.0 should fix this. My last message has a workaround to limit the size of any log line to 1000 characters. I'm sure that if you're willing to go into the log4j2 documentation there may be other workarounds. Regarding the issue at hand. You reported that after a restart the UI recovers, this indicates to me that the logs are not the real culprit, since if the logs were filling your Management Server's storage, a simple restart would not fix the issue, you would need to manually recover some space by deleting stuff. Therefore, the first step I would take is pinpoint when the services became unavailable and try to find errors during that period of time in the logs. |
@JoaoJandre , could an internal log buffer be the culprit? (as the log size is reported to increase over time.) |
@DaanHoogland I am not sure, but before going with any guesses. I would first analyze the logs and the host situation when the problem happens. @ztskycn When this happens, what is the CPU usage of the host? RAM? storage? Also, again, what are the logs saying? |
There are generally no major problems with the CPU and memory, and no obvious errors have been found in the logs. In one scenario, I now use a custom network computing solution (only three components: dhcp, dns, and user-static data). |
@ztskycn |
Your definition above should be 4。 Most of the UI interfaces can be opened, but a few cannot be opened. If operations are involved, timeout will be reported directly. |
if so, I think it should NOT be caused by the logs. can you share an example of the logs ? |
@ztskycn Are the errors in the log similar to this? The problem description is similar to one I've discovered recently, I'm in the middle of searching for where the connection leak is.
|
Description:
I'm experiencing a critical issue with CloudStack 4.20 where all API endpoints become unresponsive approximately every 10 days. The only temporary resolution is to restart the CloudStack management server.
Observed Behavior:
API requests timeout/fail completely after ~10 days of uptime
No explicit ERROR messages in logs prior to outage
Found an unusually large INFO-level log entry (3MB per line) that might be relevant
Attached log file: [filename.log] (Please ensure you actually attach the file via GitHub interface)
Environment:
CloudStack Version: 4.20.0.0
OS:Ubuntu 24.04
Steps to Reproduce:
Start CloudStack management server
Operate normally for ~10 days
API services become unavailable without obvious triggers
Expected Behavior:
API endpoints should remain available continuously without requiring manual restarts.
Additional Context:
The large INFO-level log entry repeats periodically (full content attached)
No observed resource exhaustion (CPU/MEM) before outages
Problem persists across multiple maintenance windows
Troubleshooting Attempted:
Reviewed standard error logs - no smoking gun
Monitored system resources - no apparent bottlenecks
Server restart temporarily resolves the issue
Request:
Please help investigate:
Potential memory leaks or thread blocking in the 4.20 codebase
Significance of the oversized INFO log entries
Update to Original Issue:
Further analysis of the oversized INFO log reveals repetitive entries related to createVPCOffering API calls. The JSON payload in these logs appears to be abnormally large (3MB per line) and contains repetitive configuration data.
Key Log Excerpt Pattern:
INFO [c.c.a.ApiServlet] (qtp123456789-42:) {cmd="createVPCOffering", ... JSON payload (3MB) ...}
The text was updated successfully, but these errors were encountered: