Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

contracts doesn't change to the grace period when TFT balance isn't enough #1415

Closed
A-Harby opened this issue Nov 14, 2023 · 34 comments
Closed
Assignees
Labels
type_bug Something isn't working
Milestone

Comments

@A-Harby
Copy link
Contributor

A-Harby commented Nov 14, 2023

Description

Testnet 2.2.0-rc4.

I rented a node with enough tfts, and after my balance reached zero, the rent contract didn't change into a grace period, even though I waited for more than 3 days.

image
image
image

@A-Harby A-Harby added the type_bug Something isn't working label Nov 14, 2023
@A-Harby
Copy link
Contributor Author

A-Harby commented Nov 14, 2023

image
image

@ramezsaeed ramezsaeed added this to the 2.4.0 milestone Nov 15, 2023
@sameh-farouk
Copy link
Member

sameh-farouk commented Nov 15, 2023

I looked at this issue and I will compile my debug session into this comment

General info collected:

Network: Tesnet
runtime: 146
node id: 189
node twin id: 465
farm id: 1
user twin id: 755
rent contract id: 25,402

  • The twin account used to create the rent contract remains unfunded (for a few days now) and has a very tiny amount of TFTs (0.0000501 TFT).
  • The rent contract was created approximately six days ago, specifically on Thursday, November 9, 2023, at 1:35:42 PM GMT.
  • The contract is still in the created state.
  • The node’s power status is up.
  • The contract ID normally exists in the billing loop at index 202 (contract ID % billing frequency).
  • The billing frequency is defaulted to 600, as expected.
  • Worth mentioning, this node has no extra fee set.
  • The grid proxy shows the node is UP
  • No Workloads or Deployments was created on that node
  • The up-time reports keep comings and last one was 36 minutes ago the time of writing this comment.

Things that seem off to me:

  • The lock has never been updated since the contract creation.
smartContractModule.contractLock: PalletSmartContractContractLock
{
  amountLocked: 0
  extraAmountLocked: 0
  lockUpdated: 1,699,536,942
  cycles: 0
}

More Data observed during debugging:

smartContractModule.contracts: Option<PalletSmartContractContract>
{
  version: 4
  state: Created
  contractId: 25,402
  twinId: 755
  contractType: {
    RentContract: {
      nodeId: 189
    }
  }
  solutionProviderId: null
}
system.account: FrameSystemAccountInfo
{
  nonce: 614
  consumers: 0
  providers: 1
  sufficients: 0
  data: {
    free: 501
    reserved: 0
    frozen: 0
    flags: 170,141,183,460,469,231,731,687,303,715,884,105,728
  }
}
{
  "data": {
    "rentContracts": [
      {
        "id": "0010812759-000002-f78d4",
        "nodeID": 189,
        "solutionProviderID": 0,
        "state": "Created",
        "twinID": 755,
        "gridVersion": 4,
        "createdAt": "1699536942",
        "contractID": "25402"
      }
    ]
  }
}
tfgridModule.nodePower: TfchainSupportNodePower
{
  state: Up
  target: Up
}
smartContractModule.contractLock: PalletSmartContractContractLock
{
  amountLocked: 0
  extraAmountLocked: 0
  lockUpdated: 1,699,536,942
  cycles: 0
}
smartContractModule.contractsToBillAt: Vec<u64>
[
  202
  25,402
]
smartContractModule.billingFrequency: u64
600

smartContractModule.dedicatedNodesExtraFee: u64
0
tfgridModule.nodes: Option<TfchainSupportNode>
{
  version: 5
  id: 189
  farmId: 1
  twinId: 465
  resources: {
    hru: 2,000,398,934,016
    sru: 512,110,190,592
    cru: 8
    mru: 16,746,598,400
  }
  location: {
    city: Unknown
    country: Belgium
    latitude: 50.8509
    longitude: 4.3447
  }
  publicConfig: null
  created: 1,647,950,826
  farmingPolicyId: 1
  interfaces: [
    {
      name: zos
      mac: 0c:c4:7a:51:e3:78
      ips: [
        10.5.0.42
        2a02:1802:5e:15:2d7a:428d:4a0d:7363
      ]
    }
  ]
  certification: Diy
  secureBoot: false
  virtualized: false
  serialNumber: 0123456789
  connectionPrice: 80
}
tfgridModule.twins: Option<PalletTfgridTwin>
{
  id: 465
  accountId: 5CzQQjXhDTheDDEX5zcyuragah64eojfKVr78C7evL9dBNKX
  relay: relay.test.grid.tf
  entities: []
  pk: 0x03be73bc928cb5f17a1cb6827eff0736b1d7671e06e28e92f7e7fdde698d7621db
}
query MyQuery {
  uptimeEvents(where: {nodeID_eq: 189}, orderBy: timestamp_DESC, limit: 1) {
    id
    timestamp
    uptime
    nodeID
  }
}

@sameh-farouk
Copy link
Member

Further investigation will be at threefoldtech/tfchain#893

@sameh-farouk sameh-farouk self-assigned this Nov 16, 2023
@sameh-farouk sameh-farouk moved this to In Progress in 3.14.x Nov 22, 2023
@sameh-farouk
Copy link
Member

This is now resolved.
Contract should now in the correct state
@A-Harby feel free to verify and close.
for more info see here:
threefoldtech/tfchain#893 (comment)

@A-Harby
Copy link
Contributor Author

A-Harby commented Nov 23, 2023

@sameh-farouk Indeed, the rent contract 25402 is now being billed and changed into the grace period status.
image

But the other rent contract (26338) is still on the created status and hasn't been billed since it was created yet.
image

Contract details

{
  "contractId": 26338,
  "twinID": 613,
  "type": "rent",
  "state": "Created",
  "createdAt": "11/20/2023, 8:29:30 PM",
  "nodeId": 205,
  "solutionProviderID": 0,
  "solutionName": "-",
  "solutionType": "-",
  "expiration": "-",
  "consumption": "No Data Available"
}

@sameh-farouk
Copy link
Member

sameh-farouk commented Nov 23, 2023

@A-Harby It is the same issue but with another validator 5GNE..
As I stated in the linked issue on my previous comment ..

The remaining work needed to verify and fix the validators’ setup can be tracked in the ops ticket mentioned in the previous comment.

@sameh-farouk
Copy link
Member

@A-Harby according to operation devnet, qanet, and testnet validators are checked and some issues were corrected. can you verify this now for rent and node contracts?

@A-Harby
Copy link
Contributor Author

A-Harby commented Dec 17, 2023

Yes, one of the contracts started bailing, and the status changed to grace period.
image

But the other contract (the rent contract) is still not bailing on the Qanet.
image

@sameh-farouk
Copy link
Member

sameh-farouk commented Dec 18, 2023

@A-Harby 17516 on qa is a node contract, not rent contract.
How did you create this contract? It looks like it was not sent to the node after creation. Can you check if you can access this deployment to verify that workload is really deployed ?

If you used the playground to create this deployment, please copy and paste the JSON deployment details here.

@A-Harby
Copy link
Contributor Author

A-Harby commented Dec 18, 2023

@A-Harby 17516 on qa is a node contract, not rent contract. How did you create this contract? It looks like it was not sent to the node after creation. Can you check if you can access this deployment to verify that workload is really deployed ?

If you used the playground to create this deployment, please copy and paste the JSON deployment details here.

I can't get deployment data; however, I tried to.
image

image

@A-Harby A-Harby moved this from In Progress to In Verification in 3.14.x Dec 18, 2023
@sameh-farouk
Copy link
Member

sameh-farouk commented Dec 18, 2023

I can't get deployment data; however, I tried to.

Its good to hear that other contracts are behaving as expected now.
Regarding the contract with ID 17516, There is nothing to bill. It has no IPs or CU/SU/NU usage. So from TFChain prescriptive, billing acting as expected.

The expected process is:

  • After creating the contract onchain, client notify the node about it.
  • The node reports the contract’s resource consumption to tfchain via report_contract_resources().
  • This sets the “used” capacity onchain for the contract.
  • This occurs when the workload is created, modified, or removed.
  • The node also reports NRU by calling add_reports extrinsic.

Then any billing (except for IP reservation) is based on the reserved resources and NRU consumption which are reported by the node.

But for this specific contract, the workload looks like it wasn't deployed given the below observations:

  • NodeContractResources object not exists for this contract id (The most obvious reason for this that node didn't get notified about this contract after its creation)
smartContractModule.nodeContractResources: PalletSmartContractContractResources
{
  contractId: 0
  used: {
    hru: 0
    sru: 0
    cru: 0
    mru: 0
  }
}
  • No NU consumption report and lastUpdated same as the contract created time (it never updated)
smartContractModule.contractBillingInformationByID: PalletSmartContractContractBillingInformation
{
  previousNuReported: 0
  lastUpdated: 1,700,575,914
  amountUnbilled: 0
}
  • Beside OP can't get contract info, can't find the deployment info, and can't see the solution listed on his dashboard. So something went wrong during this deployment which i can't help with without enough information.

Anyway, I believe this is totally unrelated to the original billing issue discussed here. please do open a different issue for it in the related repo (based in the way you used to deploy this).

@sameh-farouk
Copy link
Member

sameh-farouk commented Dec 18, 2023

Some recommendation for the verification process

  • Verification should done by covering at least these cases:

    • Any rent contract are being billed periodically.
    • Any Node contracts that have at least one IP reserved are being billed periodically.
    • Any Node contracts that have a nodeContractResources associated with its id and/or NU consumption are being billed periodically.
  • Verification should also done on different network separately.

    • Devnet
    • Qanet
    • Testnet
    • Mainnet

Note: billed periodically = the due amount are locked every billing cycle (600 blokcs or ~1 hour) then deducted every 24 billing cycle (~24 hours)

@xmonader
Copy link
Contributor

@sameh-farouk Thank you!

@ramezsaeed Hope you have these scenarios covered, if so, please close the issue with the link to the scenarios.

@xmonader xmonader modified the milestones: 2.4.0, 2.3.0 Dec 25, 2023
@xmonader xmonader moved this to In Verification in 3.13.x Dec 25, 2023
@khaledyoussef24
Copy link
Contributor

tried the recommended scenarios and the issue still exists
image
the contracts are created and stayed for more than hour and the wallet has no tfts but contracts did not go to grace period
image

@khaledyoussef24 khaledyoussef24 moved this from In Verification to Accepted in 3.13.x Jan 3, 2024
@A-Harby
Copy link
Contributor Author

A-Harby commented Jan 3, 2024

Devnet, b75e477.
Same I also have some contracts that haven't started billing yet (more than 2 hours have passed), even though some of them have changed their status to grace period.
image

And also, I can sneak into them.
image

@sameh-farouk
Copy link
Member

Okay, Contracts in grace period was proceeded successfully so these actually have no issues and can be escaped. next, I'm checking the contract with id 54243 to see what is happening

@sameh-farouk
Copy link
Member

sameh-farouk commented Jan 3, 2024

this 54243 contract should be billed by this validator
5DZVVFwJyEKEAUtrwmvPxkWzvjtFc73LybFK8xJse57eBP4X

I will have to check with ops regarding this one.
will update the related operation issue

next I'll check the other VM contract 54242

@sameh-farouk
Copy link
Member

sameh-farouk commented Jan 3, 2024

The other one 54242 seems not deployed at all. so it can't be billed for that reason.

  contractId: 54,242
  used: {
    hru: 0
    sru: 0
    cru: 0
    mru: 0
  }
}

Is the VM reachable? I guess No .
Can you also paste the json file for this deployment.

If you can't reach the deployment, I believe you need to open an issue for this one with how you created this contract, and that the deployment wasn't not provisioned.

@A-Harby
Copy link
Contributor Author

A-Harby commented Jan 3, 2024

is the VM reachable ?

Yes, this is the VM I ssh in my comment above: #1415 (comment)

@sameh-farouk
Copy link
Member

sameh-farouk commented Jan 3, 2024

Yes, this is the VM I ssh in my comment above: #1415 (comment)

No, The one you posted are for 54,243 not 54,242. please double check

@sameh-farouk
Copy link
Member

next I'll check @khaledyoussef24 contracts 54250, and 54251

@A-Harby
Copy link
Contributor Author

A-Harby commented Jan 3, 2024

No, The one you posted are for 54,243 not 54,242. please double check

I can't list it or find the node contract; I will report it in another issue.
image

@ramezsaeed
Copy link
Contributor

For the validator design I suggest that:

  • in devnet case we have 4 validators
  • if a block goes to a validator which is not working fine; the VM will never get billed, right ?
  • so I suggest the validator keep this block only for one period and then release it so another validator which can check it in the next period and can bill it
  • in this case we don't get blocked on any validator turned off or doesn't work anymore.

@sameh-farouk
Copy link
Member

sameh-farouk commented Jan 4, 2024

For the validator design I suggest that:

  • in devnet case we have 4 validators
  • if a block goes to a validator which is not working fine; the VM will never get billed, right ?
  • so I suggest the validator keep this block only for one period and then release it so another validator which can check it in the next period and can bill it
  • in this case we don't get blocked on any validator turned off or doesn't work anymore.

I agree, the billing logic need to get improved in many aspects not this only. I am already researching this.

@sameh-farouk
Copy link
Member

sameh-farouk commented Jan 8, 2024

@khaledyoussef24 from tfchain prospective 54250 contract's deployment hasn't provisioned yet. can you access this VM?

For the other contract, 54251 i can confirm it is not paying anything.

@sameh-farouk
Copy link
Member

sameh-farouk commented Jan 8, 2024

Contract 54251 is on a rented node 149, the contract it self is handled by validator 5Evb.. correctly and billed no thing as expected, the rent contract for this node is 54249, so this is one that should be checked instead. checking it ..

@sameh-farouk
Copy link
Member

okay this 54251 leads to validator 5FXAK4MktAf5dQZURaC5sdRWZNgJPr8zB6iFr3Cbi1j4Svjv which is not billing any contracts in his slot. Already requesting from operations team in the related ticket to check it.

@sameh-farouk
Copy link
Member

sameh-farouk commented Jan 9, 2024

After tracked contracts to the problematic validators and check on them, my initial finding that is_next_block_author method is failing to detect the validator key in these nodes' local store.

I'm investigation it further and will try to come up with a fix soon.
More specific issue can be find here threefoldtech/tfchain#932

@sameh-farouk sameh-farouk moved this from Accepted to In Progress in 3.13.x Jan 16, 2024
@sameh-farouk
Copy link
Member

sameh-farouk commented Jan 16, 2024

Contracts 54251, 54249 went to grace period after setting a new session key for the validator 5FXAK4MktAf5dQZURaC5sdRWZNgJPr8zB6iFr3Cbi1j4Svjv to meet certain assumption as described on threefoldtech/tfchain#932.

smartContractModule.contracts: Option<PalletSmartContractContract>
{
  version: 4
  state: {
    GracePeriod: 9,246,850
  }
  contractId: 54,249
  twinId: 5,822
  contractType: {
    RentContract: {
      nodeId: 149
    }
  }
  solutionProviderId: null
}

smartContractModule.contracts: Option<PalletSmartContractContract>
{
  version: 4
  state: {
    GracePeriod: 9,246,850
  }
  contractId: 54,251
  twinId: 5,822
  contractType: {
    NodeContract: {
      nodeId: 149
      deploymentHash: aed69a2c07ee962def015fe94f11ea8d
      deploymentData: {type:vm,name:vmw9uf7,projectName:fullvm/vmw9uf7}
      publicIps: 0
      publicIpsList: []
    }
  }
  solutionProviderId: 1
}

This happens as well for contract 54243 after applying same process to validator 5DZVVFwJyEKEAUtrwmvPxkWzvjtFc73LybFK8xJse57eBP4X

smartContractModule.contracts: Option<PalletSmartContractContract>
{
  version: 4
  state: {
    GracePeriod: 9,246,844
  }
  contractId: 54,243
  twinId: 291
  contractType: {
    NodeContract: {
      nodeId: 16
      deploymentHash: d78a91992807d3253163fd5482b56aa5
      deploymentData: {type:vm,name:vmk176a,projectName:fullvm/vmk176a}
      publicIps: 0
      publicIpsList: []
    }
  }
  solutionProviderId: 1
}

Devnet should be fixed now, also QAnet is not suffering from this issue.
We will apply same process to mainnet tomorrow, testnet will follow..

@sameh-farouk
Copy link
Member

sameh-farouk commented Jan 20, 2024

@ramezsaeed @A-Harby @khaledyoussef24
Should be fixed now on all networks. I'll move the issue to verification.

#1415 (comment)

@sameh-farouk sameh-farouk moved this from In Progress to In Verification in 3.13.x Jan 20, 2024
@A-Harby
Copy link
Contributor Author

A-Harby commented Feb 12, 2024

Verified,
I kept trying to reproduce the issue whenever I deployed an instance of any solution, and I even used different projects and clients (dashboard, TS client, mass deployer, and others) to help me reproduce the issue again, but till now it has not shown up again.

The mass deployer helped me the most with the issue because it could deploy a large amount of VMs all at once, ensuring that every validator will be assigned at least once to a contract, and after waiting for a few minutes, I was able to verify that all of them are being billed and converted to the grace period status once the money is not enough.
image
image
image

All of these tests have been done over the four networks through different clients.
And I also added the test cases for them to make sure it is tested with every release across the networks.

@sameh-farouk
Copy link
Member

sameh-farouk commented Feb 13, 2024

Verified,
I kept trying to reproduce the issue whenever I deployed an instance of any solution, and I even used different projects and clients (dashboard, TS client, mass deployer, and others) to help me reproduce the issue again, but till now it has not shown up again.
The mass deployer helped me the most with the issue because it could deploy a large amount of VMs all at once, ensuring that every validator will be assigned at least once to a contract, and after waiting for a few minutes, I was able to verify that all of them are being billed and converted to the grace period status once the money is not enough.
All of these tests have been done over the four networks through different clients.
And I also added the test cases for them to make sure it is tested with every release across the networks.

It's a relief that we finally wrapped the contracts' billing issue up. It was a tricky one since the reports of unbilled contracts from the testing team were due to different reasons and We had to debug each set separately.

@github-project-automation github-project-automation bot moved this from In Verification to Done in 3.13.x Feb 13, 2024
@A-Harby
Copy link
Contributor Author

A-Harby commented Feb 13, 2024

@sameh-farouk, I updated the comment with the test cases if you could review them and check if it's enough to cover all the cases, and if not, please comment with a missing or incorrect one.

@sameh-farouk sameh-farouk changed the title Rent contract doesn't change to the grace period when TFT balance isn't enough contracts doesn't change to the grace period when TFT balance isn't enough Feb 13, 2024
@sameh-farouk
Copy link
Member

@A-Harby I Updated the test casees docs.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
type_bug Something isn't working
Projects
No open projects
Status: Done
Development

No branches or pull requests

5 participants