-
Notifications
You must be signed in to change notification settings - Fork 45
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Node Stability #194
Comments
I haven't been able to reproduce this, so I am closing it because I assume it was a one-off issue. |
Reopened due to user report |
running on a vps with ryzen 7 8 core processor, 128gb memory, 2gb ssd raid storage.... It didn't seem to start syncing. It was restarted a couple of times but then after a week I tried restarting it and now it is syncing http://95.217.121.243:9922/blocks/height I can only speculate there is something going into a deadlock when the chain has no blocks yet. Here is the log: ok, second issue, the log leaks the private key, about 8th line from the top. |
@stalker-loki I didn't find any clues that may cause syncing to stop from the log. But I suspect it could be a problem of poor network sometimes, especially when your machine is behind a giant firewall for example. We will keep checking any other reasons that may result in this. I recommend to optimize network and add more peers. for issue 2: If this is a supernode, I suggest u use a cold wallet to receive reward. And the wallet address in your log file is used only for minting. Don't put any balance in it. If u generate a wallet first and then start the node, the private key won't show up in the log. BTW, cold wallet minting is one of V Systems Chain's advantages. One just fill in the reward address with a cold wallet address in the config file.Then rewards will goes into the cold wallet rather than the minting address. This secure safety of ur property. |
that's this one down the bottom of this section: ?
|
Yes, |
I run several VSYS full nodes, some on mainnet, some on testnet. One of them, @stalker-loki runs on my behalf. They stay up, but stop syncing. Sometimes, they reach a full sync state, and run for a while with the chain's current height. Other times, they just stop. All of my VSYS full nodes are in top-tier datacenters, specifically the hetzner.de datacenters in Germany and Finland. There are other blockchain nodes on those machines. The other chains run very happily and without interruption. Unfortunately, VSYS does not run happily and without interruption. Ethereum the blockchain weighs in at 236GB. On my Hetzner node, I'm able to sync it in about 12 hours. VSYS weighs in at ~10GB, and sync takes 24 hours. Additionally, in my experience VSYS full nodes aren't very stable. This is the spec of my server. It is in a professionally run datacenter and it's highly unlikely that there are network issues. I run additional nodes on Hetzner machines and VSYS is the only one that frequently either stops syncing during initial standup, or stops advancing block height after it has already synced. I've observed this issue with VSYS losing sync on machines at my home, as well, where I also run nodes for other blockchains, which do not lose sync. Today, I was attempting to record a video on one of VSYS' unique concepts, the minting average balance. Unfortunately, across several nodes, I was unable to discern if the node had simply gone down, or if my once every one to two seconds API request had crashed it:
I was just using that to show the increase in MAB, I think it makes a great visual and conversation starter since liquid staking is such a hot topic right now. Interestingly, on VSYS we already have liquid staking, no derivatives needed. Anyhow, I was unable to complete my video, because the node had crashed:
As you can see, the block height stopped advancing on June 15th at block 12146933. Looking at that node's addresses transaction logs, I did not see anything happen around the 15th. The last transactions that happened through that node are here: https://explorer.v.systems/address/ARKYdc1pgGSefeCjNNkzWwnoKBVVMDYzex7 That's the node that I used for the db put tutorial. But my db put transactions were on June 9th, so I'm forced to conclude that this is a case of garden-variety instability. Next, we can look at another node that I run, this one on testnet 0.3. I wanted to try my MAB queries against it. First from my web browser in the swagger UI I confirmed that it had stopped. Then I logged into my machine at Hetzner and I:
So, after restarting the service, the node began to sync again. By this point, I'd gotten pretty curious about stability issues. I'd noticed stability problems on three machines. So, I put up another mainnet node on a Hetzner server, in docker, on port 8822. While I was writing this issue, it crashed in exactly the manner that this issue describes:
API stays up, block height stops advancing. This server at hetzner runs a full node for whaleshares, an application specific blockchain focused on social media. My whaleshares node never ships a beat, staying in sync with the chain's three second block time. Here's my ethereum node, again, same machine: This has been quite a long issue, but I decided that it was necessary to provide exhaustive evidence that at present there are at least two stability problems with VSYS nodes:
I chose to compare with 2 of the other chains that I run nodes on because unfortunately, my VSYS node is the only one that exhibits this particular issue, and this is not restricted to just machines running in Hetzner datacenters, but instead effects VSYS nodes that I have attempted to run at various times on my personal mac laptop, a home server, and a node on vultr.com. Log files are available on Slack. |
I thought that this may be helpful in troubleshooting. The node mentioned above which is stuck at block: is showing this: curl -X GET "http://95.217.121.243:9922/peers/all" -H "accept: application/json"
{
"peers": [
{
"address": "/3.121.94.10:9921",
"lastSeen": 9223372036854776000
},
{
"address": "/13.52.40.227:9921",
"lastSeen": 9223372036854776000
},
{
"address": "/13.55.174.115:9921",
"lastSeen": 9223372036854776000
},
{
"address": "/13.113.98.91:9921",
"lastSeen": 9223372036854776000
}
]
} Healthy VSYS mainnet nodes typically have 34 or 35 peers. root@buildbox ~ # curl -X GET "https://wallet.v.systems/api/peers/all" -H "accept: application/json"
{
"peers" : [ {
"address" : "/13.52.96.166:9921",
"lastSeen" : 9223372036854775807
}, {
"address" : "/35.177.188.74:9921",
"lastSeen" : 9223372036854775807
}, {
"address" : "/138.197.196.78:9921",
"lastSeen" : 1593266885446
}, {
"address" : "/54.95.22.119:9921",
"lastSeen" : 1593349688901
}, {
"address" : "/3.121.94.10:9921",
"lastSeen" : 1593349688178
}, {
"address" : "/13.115.105.184:9921",
"lastSeen" : 1593349688909
}, {
"address" : "/3.104.62.227:9921",
"lastSeen" : 9223372036854775807
}, {
"address" : "/34.196.27.234:9921",
"lastSeen" : 9223372036854775807
}, {
"address" : "/3.17.31.9:9921",
"lastSeen" : 9223372036854775807
}, {
"address" : "/165.227.64.201:9921",
"lastSeen" : 1593266908452
}, {
"address" : "/52.60.124.131:9921",
"lastSeen" : 9223372036854775807
}, {
"address" : "/18.191.26.101:9921",
"lastSeen" : 9223372036854775807
}, {
"address" : "/35.180.246.64:9921",
"lastSeen" : 9223372036854775807
}, {
"address" : "/45.76.155.8:9921",
"lastSeen" : 1593349688889
}, {
"address" : "/52.35.120.221:9921",
"lastSeen" : 9223372036854775807
}, {
"address" : "/54.92.10.151:9921",
"lastSeen" : 1593349688190
}, {
"address" : "/13.52.40.227:9921",
"lastSeen" : 9223372036854775807
}, {
"address" : "/3.16.244.131:9921",
"lastSeen" : 9223372036854775807
}, {
"address" : "/54.69.23.204:9921",
"lastSeen" : 9223372036854775807
}, {
"address" : "/3.17.187.179:9921",
"lastSeen" : 1593349688147
} ]
} |
This issue may cause by the connection of known peers, and users may solve this by following strategies,
java -jar v-systems-v***.jar, vsys.conf for more than 10 machines(last week), all of them synced well until now. So, I guess the issue may in the .deb service(may need some extra network-related right/resouce).
In conclusion, most of the stable issues (service, not only v node) are caused by the less system resource allocated, if in the same machine, users run other services with higher resource allocated, it may cause the resource allocation issues to some lower resource needed services. In order to avoid this, one may force allocate more resources to the service. For example, if you run the v node in java directly, you can give more memory in java -Xmx4096m -jar ***.jar **.conf allocated more threads in java -Dscala.concurrent.context.maxExtraThreads=1024 -jar ***.jar **.conf At last, I will not tag this issue to bug. |
If there is an issue in the .deb file that we ship to users that causes instability and downtime, it's a problem. I mean, we can triage / label this how we would like to, but we are shipping the .deb to users so it is very important that it work properly or not be shipped. |
Platform:
Configuration:
I followed these directions exactly:
https://github.com/virtualeconomy/v-systems/wiki/How-to-Install-V-Systems-Mainnet-Node
Problem:
The node stopped syncing at 6126862, and the systemd log showed that it was handshaking with just one peer, over and over.
Resolution:
I ran:
systemctl restart vsys
and the node began to sync again. I also created a teeny tiny sync monitor tool:
Users may want to restrict their API to localhost for security reasons, and this allows them to easily monitor sync progress, albeit in a very basic way.
The text was updated successfully, but these errors were encountered: