Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add gpu stats endpoint for internal monitoring #109

Merged
merged 7 commits into from
Feb 24, 2024

Conversation

Redm4x
Copy link
Contributor

@Redm4x Redm4x commented Feb 22, 2024

Added 2 api endpoints for internal monitoring (#106)

https://api.cloudmos.io/internal/gpu

Returns a summary of the gpus on the network.

Example Response

{
  "gpus": {
    "total": {
      "allocatable": 2,
      "allocated": 0
    },
    "details": {
      "nvidia": [
        {
          "model": "t4",
          "ram": "16Gi",
          "interface": "PCIe",
          "allocatable": 1,
          "allocated": 0
        },
        {
          "model": "rtx3060ti",
          "ram": "8Gi",
          "interface": "PCIe",
          "allocatable": 1,
          "allocated": 0
        }
      ]
    }
  }
}

Query parameters for filtering


Param Description
provider Either a provider address (ex: akash175llqyjvxfle9qwt740vm46772dzaznpzgm576) or a host uri (ex: https://provider.akashprovid.com:8443)
vendor Ex: nvidia
model Ex: t4
memory_size Ex: 16Gi

All query parameters can be combined, ex:
https://api.cloudmos.io/internal/gpu?provider=akash175llqyjvxfle9qwt740vm46772dzaznpzgm576&vendor=nvidia&model=rtx3060ti&memory_size=8Gi

https://api.cloudmos.io/internal/provider-versions

Returns a list of versions and the providers that are currently on that version. The <UNKNOWN> version correspond to providers where the version could not be determined. The /version endpoint was broken for a long time, but is now fixed in v0.5.0-rc11

Example Response

{
  "0.5.0-rc16": {
    "version": "0.5.0-rc16",
    "count": 4,
    "ratio": 0.05,
    "providers": [
      "https://provider.moonbys.cloud:8443",
      "https://provider.akashprovid.com:8443",
      "https://provider.akashtesting.xyz:8443"
    ]
  },
  "0.5.0-rc15": {
    "version": "0.5.0-rc16",
    "count": 4,
    "ratio": 0.05,
    "providers": [
      "https://provider.akash.pro:8443"
    ]
  },
  "<UNKNOWN>": {
    "version": "<UNKNOWN>",
    "count": 80,
    "ratio": 0.95,
    "providers": [
      "https://provider.macptrading.com:8443",
      "https://provider.digitaler-friedhof.com:8443",
      "https://provider.qioi.io:8443",
      "https://provider.bluepeer.io:8443",
      ...
    ]
  }
}

@Redm4x Redm4x marked this pull request as ready for review February 24, 2024 17:56
@Redm4x Redm4x merged commit db887c0 into main Feb 24, 2024
5 checks passed
@Redm4x Redm4x deleted the features/add-gpu-stats-endpoint branch February 24, 2024 17:58
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants