Skip to content

Conversation

@leonace924
Copy link
Contributor

@leonace924 leonace924 commented Jan 6, 2026

ℹ️ To keep reviews fast and effective, please make sure you’ve read our pull request guidelines

📝 Summary of changes done and why they are done

  • Explicitly pushing status=down now bypasses retry logic and goes directly to DOWN

Problem

When a user explicitly pushes status=down to a push monitor with retries configured, the monitor incorrectly goes to PENDING instead of DOWN. Users expect that explicitly pushing down should immediately mark the monitor as DOWN.

Reproduction:

  1. Create push monitor with Retries = 3
  2. Push status=down explicitly: curl "http://localhost:3001/api/push/TOKEN?status=down"
  3. Bug: Monitor goes to PENDING (not DOWN)

Root Cause

The determineStatus() function applies retry logic to all DOWN statuses, regardless of whether the status was explicitly provided by the user or resulted from a timeout/missed push.

Solution

Added isExplicitDown flag that bypasses retry logic when status=down is explicitly pushed:

// Detect explicit down push
const isExplicitDown = request.query.status === "down";

// In determineStatus(): bypass retry logic
if (isExplicitDown && status === DOWN) {
    bean.retries = 0;
    bean.status = DOWN;
    return;
}

Behavior Changes

Scenario Before After
Push status=down with retries PENDING DOWN
Timeout/missed push with retries PENDING PENDING (unchanged)

📋 Related issues

📄 Checklist

Please follow this checklist to avoid unnecessary back and forth (click to expand)
  • ⚠️ If there are Breaking change (a fix or feature that alters existing functionality in a way that could cause issues) I have called them out
  • 🧠 I have disclosed any use of LLMs/AI in this contribution and reviewed all generated content.
    I understand that I am responsible for and able to explain every line of code I submit.
  • 🔍 My code adheres to the style guidelines of this project.
  • ⚠️ My changes generate no new warnings.
  • 🛠️ I have reviewed and tested my code.
  • 📝 I have commented my code, especially in hard-to-understand areas (e.g., using JSDoc for methods).
  • 🤖 I added or updated automated tests where appropriate.
  • 📄 Documentation updates are included (if applicable).
  • 🔒 I have considered potential security impacts and mitigated risks.
  • 🧰 Dependency updates are listed and explained.

📷 Screenshots or Visual Changes

  • UI Modifications: Highlight any changes made to the user interface.
  • Before & After: Include screenshots or comparisons (if applicable).

Contribution by Gittensor, see my contribution statistics at https://gittensor.io/miners/details?githubId=42954461

@leonace924
Copy link
Contributor Author

@CommanderStorm would you review this PR as well? for two issues fix

Copy link
Collaborator

@CommanderStorm CommanderStorm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you please split this into separate PRs? We try to keep it to one PR = one logical change.

The second change looks promising (though likely breaking), but the first isn’t quite ready yet. Keeping them separate will make review and decision-making much easier.

@leonace924
Copy link
Contributor Author

@CommanderStorm this is splitted into one PR

@CommanderStorm CommanderStorm changed the title fix: push monitor retry and notification issue fix: Explicitly pushing status=down now bypasses retry logic and goes directly to DOWN Jan 6, 2026
@CommanderStorm CommanderStorm added this to the 3.0.0 milestone Jan 6, 2026
Copy link
Collaborator

@CommanderStorm CommanderStorm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, but won't be merged into v2 due to likely being a breaking change

@CommanderStorm CommanderStorm added the pr:depends on other pending other things to be done first label Jan 6, 2026
leonace924 and others added 2 commits January 5, 2026 23:15
@leonace924
Copy link
Contributor Author

Hi @CommanderStorm when would you launch the v3.0? 😉

@CommanderStorm
Copy link
Collaborator

This will take a while. If you want, you can revise it onto the v3.0 branch and we can merge it there

@dallyger
Copy link

dallyger commented Jan 7, 2026

Is there a way to opt back in to explicitly push a pending status? Something like a status=pending or status=retry query parameter.

We currently use the DOWN status with retries to differ between "command did not run" and "command ran but was not successful" while keeping the time metric of those failed runs.
Those would be lost if we skip the ping. Marking the status as DOWN directly is not preferred, as the commands are allowed to fail sometimes as long as they run again soon.

One such use case is a command where I monitor disk space capacity of a server to warn on usage above e.g. 80% where I misuse the time metric as the disk usage percentage. Graph shows 0-100ms but means 0-100% disk used.
Not pushing the "pending" status means, that I don't know how rapidly it is filling up after reaching the threshold. Pushing it as DOWN immediately will trigger a lot of notifications if it fluctuates between 79-80%.

@CommanderStorm
Copy link
Collaborator

Is there a way to opt back in to explicitly push a pending status? Something like a status=pending or status=retry query parameter.

Yes, if you push status=pending this would not change this.
You still will retry as usual as far as I think

PS: I know this is a breaking change for you, I still want to give the ability to fail the check instead of having basically 2x pending (one named "pending", one named "down") for the monitor.
This change is a while out, as I will only merge it into v3.0

@dallyger
Copy link

dallyger commented Jan 7, 2026

Yes, if you push status=pending this would not change this.

Awesome, that is all I need.

PS: I know this is a breaking change for you

No worries. As long as it is clearly communicated, I see no issues with that. A fix on my end is easy and can be applied even before upgrading to the v3.0.

I do prefer the difference between state=pending and state=down too! In fact, this is what I suggested in #3208, which resulted in the DOWN state to respect the retries:

I think the best action would be to add a third state (up, down and pending)

I also saw this issue while searching for mine: #4785.
So maybe even a fourth status could be added?
My initial idea would be to explicitly allow these states:

up - Mark monitor as up.
down - Mark monitor as down ignoring retries.
retry - Mark monitor as pending (or keep it DOWN) as it will be retried later on
pending - Do not change monitor status but add metrics / message to the history.

Alternatively the pending could keep UP as UP but move DOWN to PENDING. Not sure which behavior I would prefer. Does it cause notifications if it then goes back to DOWN? Is this maybe even preferred?

@CommanderStorm
Copy link
Collaborator

can be applied even before upgrading to the v3.0.

Breaking changes, no matter how nice, need to be done in a semver major.
We loose all trust otherwise

@CommanderStorm
Copy link
Collaborator

CommanderStorm commented Jan 7, 2026

So maybe even a fourth status could be added?

Introducing a fith status (no matter if it is partial, retrying or whatever) would involve a fairly large amount of work, as most of our internal code and notification providers aren’t currently designed with this in mind. Supporting it properly would require changes across multiple areas.

Given my current review capacity, I’m unfortunately not able to take on or oversee additional work of this scope right now.

For some additional background and context, please see:

There are also some larger refactorings (for example starting to increase our test coverage) needed in how our internal components interact before we could realistically consider something like this.
Currently, the risk of such a change is way too high.
If you want to contribute to the effort of reducing such a risk, I can mentor/ guide you towards this goal

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

needs:resolve-merge-conflict pr:depends on other pending other things to be done first

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Explicitly pushing down status does not work anymore

3 participants