The cluster size is 400 applications, 4,000 pods, and 3 billion segments are generated every day. Looking for advice. #12235
liuxinagxiang
started this conversation in
General
Replies: 2 comments
-
Elasticsearch healthy doesn't mean it is powerful enough. Check self-observability data, especially OAP flush metrics. I believe it is too slow, then everything goes to be blocked eventually. |
Beta Was this translation helpful? Give feedback.
0 replies
-
Is your SkyWalking deployed in Kubernetes? If so, could you provide the relevant documentation? I'm also making relevant modifications recently. |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
My project generates about 3 billion segments a day, and the current architecture is skywalking-agent --> kafka ---> skywalking-L1-L2 ---> ES.
![pic6](https://private-user-images.githubusercontent.com/34936970/331653962-dfe1e2bc-dc27-44f8-8bf2-9801d4c3ff4b.png?jwt=eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJpc3MiOiJnaXRodWIuY29tIiwiYXVkIjoicmF3LmdpdGh1YnVzZXJjb250ZW50LmNvbSIsImtleSI6ImtleTUiLCJleHAiOjE3MTk5Mjk1MTgsIm5iZiI6MTcxOTkyOTIxOCwicGF0aCI6Ii8zNDkzNjk3MC8zMzE2NTM5NjItZGZlMWUyYmMtZGMyNy00NGY4LThiZjItOTgwMWQ0YzNmZjRiLnBuZz9YLUFtei1BbGdvcml0aG09QVdTNC1ITUFDLVNIQTI1NiZYLUFtei1DcmVkZW50aWFsPUFLSUFWQ09EWUxTQTUzUFFLNFpBJTJGMjAyNDA3MDIlMkZ1cy1lYXN0LTElMkZzMyUyRmF3czRfcmVxdWVzdCZYLUFtei1EYXRlPTIwMjQwNzAyVDE0MDY1OFomWC1BbXotRXhwaXJlcz0zMDAmWC1BbXotU2lnbmF0dXJlPTE0NWE1YmU2M2U0NGVmMTUyYjQzMjMwNjk2ZWNkNjJiOGQxNDg1NTljZGFmMjU5NTc4OGQzOTMzYzliMWU3ZjUmWC1BbXotU2lnbmVkSGVhZGVycz1ob3N0JmFjdG9yX2lkPTAma2V5X2lkPTAmcmVwb19pZD0wIn0.qil8o5cxoK_iCBYR7GazE8aOfmUd2py32S_ioUjRrLE)
The scale is as follows: 400 microservices, nearly 4,000 pods, three Kafka nodes, skywalking-L1 6 nodes: Xmx8G Xms8G Xmn3G, skywalking-L2 6 nodes: Xmx8G Xms8G Xmn3G, elasticsearch 6 nodes: Xmx16G Xms16G Xmn6G
#kafka3.7.0 skywalking9.2 elasticsearch7.17.18
The current problem encountered is that kafka cluster and elaticsearch cluster are normal but skywalking L1 and L2 clusters continue to report errors,skywalking oap service consumption is very slow . After each restart, kafka data is consumed normally within ten minutes and various timeout errors are reported. so I have moved L2 has been moved to k8s cluster container deployment to avoid manually restarting oap every time 😅
I want to know how to plan the cluster according to the size of my project. Which configurations can be optimized? Does anyone have any suggestions?
I was referring to this article recently: https://skywalking.apache.org/zh/2022-08-30-pingan-jiankang/
Beta Was this translation helpful? Give feedback.
All reactions