Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

dev2 - Z-probe sometimes ignored - crash into bed #910

Open
kyrreaa opened this issue Mar 1, 2020 · 35 comments
Open

dev2 - Z-probe sometimes ignored - crash into bed #910

kyrreaa opened this issue Mar 1, 2020 · 35 comments

Comments

@kyrreaa
Copy link
Contributor

kyrreaa commented Mar 1, 2020

During large probing sessions the back move is apparantly not executed correctly and motion continues to bed. Popping probe up to inactive state. (In this state it reports H as if it was triggered.)
Instead of aborting and backing off and reporting an error, the fw continues to grind into bed.

If I deactivate the probe before a G30 to test it jiggles about and aborts with message to host about error, so this works as expected.

I have also noticed that some of the back-moves after probe hit is done at wrong speed and I suspect this is related.

@kyrreaa
Copy link
Contributor Author

kyrreaa commented Mar 1, 2020

There are some magic numbers in the code "Motion1::stopMask = 4; // z trigger is finished" that seem odd. It basically refers to A_AXIS and not ZPROBE_AXIS?

@repetier
Copy link
Owner

repetier commented Mar 1, 2020

Mask is a bit field, so 1 = x, 2 = y, 4 = z, 8 = E, 16 =a, ...

What probe type are you using? In other thread you said type 2 but that is nozzle as probe which has no offset to activate. There only position is moved to valid region = bed minus safety border.

I guess you use type 1 = default zprobe or 3 = BLtouch.

@kyrreaa
Copy link
Contributor Author

kyrreaa commented Mar 1, 2020

Yes, sorry. I use type 1.
This kind of looks like race condition or interrupt issue to me.
I find it odd though that it happens very rarely. If it was an instance of buffered state not being updated (classic volatile stuff) then it would happen all the time.

There is the possibility though that the previous move to location triggers the endstop to snap to disabled and that this happens at the exact point the code switches over to probing. Since it checks the endstop only once before recalculating the path needed for the probe it could possibly end up not seeing the transition as it queues the move and starts to wait?
If the transition happens before endstopMode = EndstopMode::PROBING would it not be ignored?

@repetier
Copy link
Owner

repetier commented Mar 1, 2020

Not sure I can follow here. Looking at float ZProbeHandler::runProbe() I see that it return without any move if probe is already triggered. Inside the for loop it does measurements and always moves back again to start position, so will in theory never stop at bottom. Then it is updated again.

One problem are zprobes where high signal is only short like BLtouch. That is why we have that as special case to test also fro probe error state. That is the only probe I have for testing at the moment. Hope I can test soon - printer is not 100% functional at the moment a sit is a delta and I also need to check delta function working in parallel. Anyhow in this code segment

        // Go up to tPos2
        FOR_ALL_AXES(i) {
            int32_t df = tPosSteps2[i] - tPosSteps3[i];
            corSteps2[i] = ((df > 0) ? 1 : ((df < 0) ? -1 : 0)) * (labs(df) - Motion1::stepsRemaining[i]);
        }
        Motion1::moveRelativeBySteps(corSteps2);
        Motion1::waitForEndOfMoves();

The head should always move back. What is your Z_PROBE_REPETITIONS value? On repeated moves it does not move full distance back and there could be miscount.

BTW: What is your test script you are running here? Just to understand the condition under which the problem happens.

@kyrreaa
Copy link
Contributor Author

kyrreaa commented Mar 1, 2020

I am running bed map from Repetier Host with 20x20 points.
I added some code in the runProbe to investigate. I moved the code in waitForEndOfMoves out into a local while loop and added tests for !Motion1::axesTriggered && ZProbe->update(). This will run at loop speed with any free cpu resources as the system spins anyways. Thus no impact on performance. I then Print a warning that there has been a pin change isr miss on probe and count iterations of loop that sees the same state. After 3 iterations I print a new warning about aborting and abort the loop. I do see my message printed every 10 probes or so, but I do not see the second message. My theory is that the first message can come every once in a while due to timing conditions but since the pin change should have higher priority than main it should not be many iterations of delay.

@kyrreaa
Copy link
Contributor Author

kyrreaa commented Mar 1, 2020

I only do single probes. No repeated with averages.

@kyrreaa
Copy link
Contributor Author

kyrreaa commented Mar 1, 2020

My understanding is that only updateReal() in EndstopSwitchHardwareDriver will update correctly.
Calling update() should just read the state set by the updateReal().
Since this appears not to always end the iterations I can assume that somehow the state change is registered before the status to actually update/probe is added.

My suggestion is to check the flag again before waiting, but after setting the states to probe.

@kyrreaa
Copy link
Contributor Author

kyrreaa commented Mar 1, 2020

I am trying to add an error message right before the moveByPrinter() to catch this.
If it triggers then it would confirm something like this is happening. Hopefully I can fix it before it breaks my "one of a kind" hotend.

@repetier
Copy link
Owner

repetier commented Mar 1, 2020

What endstop type are you using? I currently always use ENDSTOP_SWITCH_HW which triggers an interrupt and stop move directly. The polling solution always calls update() to get updated state on each step.

updateReal() is special case for hardware endstop. That is the function called by processor on signal change and updates state for later use in update(). Looking into it you would need to have the end stop for z probe defined as zaxis, false (dir -1) so it stops motion

          if (axis >= 0 && newState) { // tell motion planner
                endstopTriggered(axis, dir);
            }

as that is the only way it would tell it if you have
#define NO_SOFTWARE_AXIS_ENDSTOPS
to speed up endstop testing.
If it does not stop then it would move the z to the end loosing steps and steps remaining is 0 and does go up to higher position.

@kyrreaa
Copy link
Contributor Author

kyrreaa commented Mar 1, 2020

Nope. Did not catch it... It crashed into bed again. This time I simplified my test.
Just a positioning of head to a x,y coordinate and running a script with 200 probes.
G30 P1
G30 P0
G30 P0
...
G30 P2

It just kept dinking into the same spot untill it suddently kept on going and crashed into bed.
This pops the probe up as I mentioned so it will stay at high level. Still it did not stop.
I confirmed it is sending high after popping.
Adding the check for probe state in the loop instead of waitForEndOfMoves() seemed to work.
I will have to retest to confirm but it did not appear to crash when testing there. I could be just lucky though.

@kyrreaa
Copy link
Contributor Author

kyrreaa commented Mar 1, 2020

And yes, I use ENDSTOP_SWITCH_HW and that is the issue I think.
Somehow it does not stop the move.

@repetier
Copy link
Owner

repetier commented Mar 1, 2020

Can you post your 2 configurations. Guess it might be important which exact settings you have here.
You could also try without pushing endstops and use polling instead to see if that is the problem. Will test as soon as my time allows. Depending on what it is it could also happen to homing where the same technique to stop is used.

@kyrreaa
Copy link
Contributor Author

kyrreaa commented Mar 1, 2020

I guess I don't home as often as I probe so statistics is in my favour there.
I can push to my fork so it can be accessed without having to fill the issue up with files.

@repetier
Copy link
Owner

repetier commented Mar 1, 2020

Also great, just give me the link then and I check there.

@kyrreaa
Copy link
Contributor Author

kyrreaa commented Mar 1, 2020

@kyrreaa
Copy link
Contributor Author

kyrreaa commented Mar 1, 2020

I will switch the z probe to polled and compare.

@kyrreaa
Copy link
Contributor Author

kyrreaa commented Mar 1, 2020

Wow. Well, changing the z probe to a simple polled probe makes it just ignore it all together.
M119 shows it is changing states properly but it is being ignored. It also behaves more correctly on the homing part as it does not appear to rehome x and y before probing when it is a sw probe.

Edit: I ofcource forgot to remove NO_SOFTWARE_AXIS_ENDSTOPS.

Edit 2: Nope. Still ignores it...

@kyrreaa
Copy link
Contributor Author

kyrreaa commented Mar 1, 2020

I think I found something.
In WInterrupts.c the pin change interrupts are initialized with NVIC_SetPriority(PIOD_IRQn, 0); in my case for the endstop input on Ramps-FD I use as probe. In fact all ports are inited to level 0.
I use Zmin btw, which is PIN D26 aka GPIOD.1 but that is irrelevant when it does so with all...
It is therefore possible for the pin change to happen and interrupt the MOTION3 level as it is configured for DUE in HAL.cpp as level 2.

@kyrreaa
Copy link
Contributor Author

kyrreaa commented Mar 1, 2020

Could it be as simple as motion trying to move to next segment of travel while being interrupted by pin-change which sets flags to abort in the active segment. The active segment is then discarded when motion resumes and it can no longer see it should have stopped?

@kyrreaa
Copy link
Contributor Author

kyrreaa commented Mar 1, 2020

WInterrupts.c is part of the core arduino framework by the way, so a workaround or safeguard would be needed in motion...

@kyrreaa
Copy link
Contributor Author

kyrreaa commented Mar 1, 2020

I am using quite slow accelleration so it is possible that the move is split into multiple segments to handle the accelleration and the hig happens just as it runs activateNext()? This would allow the probe hit to affect the old segment and then upon continue the new segment replaces it.

@kyrreaa
Copy link
Contributor Author

kyrreaa commented Mar 1, 2020

Well... I thought I could bypass it by removing #define NO_SOFTWARE_AXIS_ENDSTOPS and letting the code check as normal. This did not work.
I expect the issue has to do with paths, segments and timing so I made a new testscript that changes to relative mode (g91) and increases z by 0.01mm (g0 z0.01) before changing back to absolute mode (g90) between each probe (g30 p0). The script starts with probe almost touching and a "g30 p1".
After reaching about the height I use to probe over my slightly warped bed it crashed.

@repetier
Copy link
Owner

repetier commented Mar 2, 2020

Motion3 should have highest priority, but I saw it was set to 3 in hal.cpp. Changing it to
NVIC_SetPriority((IRQn_Type)MOTION3_TIMER_IRQ, 0); // highest priority - no surprises here wanted
should solve that problem with priority.

Looking into code I think skipParentId would be the crucial variable. It gets set to force Motion3 to finish line, but only activateNext changes it if the line is finished meaning z move is executed. So even if we missed the check it means 1 extra step until move is popped.

I have tested it 100 times on my delta G1 P1 + 18x G30 P0 then G30 P2 no errors. That is on a stm32F4 board where interrupt priority is 0. So not sure if it priority, other speed or just luck.
Best you try with motion3 priority 0 and see if it improves.

I also got no shift of position, but maybe I need to move xy in between to get that error.

@kyrreaa
Copy link
Contributor Author

kyrreaa commented Mar 2, 2020

I think I discovered the reason for the shift.
Basically I had z home height set to 0 which disables any move avter homing, but I had tool change z lift to 2mm which tried to move z up by 2mm after homing. Since it was already at Zmax height (after backing off the endstop 2mm) it could not move up and so reset the machine homed and said it cannot be trusted.
The rehoming done for probing rehomes x and y but not z so it is kind of odd.
Changing the z home height to within the work envelope fixed this.

As for the priority, pin change interrupts appear to have higher priority than timer interrupts so even at equal priority ranking the pin change will win. Solution is to make sure the operation in timer3 is safe.

@kyrreaa
Copy link
Contributor Author

kyrreaa commented Mar 2, 2020

The reproduction is very dependant on machine accelleration I think. I made my script like this:

`
G30 P1

G91
G0 Z0.01
G92
G30 P0

G91
G0 Z0.01
G92
G30 P0

G91
G0 Z0.01
G92
G30 P2
`

The middle two repetitions would then be copied a LOT of times. (2-400) and it would probe same spot over and over. Starting with probe almost touching.
Mine has accelleration set to 500mm/s^2 and it fails at around 3mm.
It may also be dependant on the segment count or such if exists... Deltas are different from linear machines there I guess.

I also want to repeat that it did not work even if I removed the NO_SOFTWARE_AXIS_ENDSTOPS which could mean the polled method is vulnerable too.

@repetier
Copy link
Owner

repetier commented Mar 2, 2020

There is an error in MotionLevel3.cpp line 166

        if (testMotorId == Z_AXIS && Motion1::endstopMode == PROBING) {
            if (ZProbe->update()) { // ignore z endstop here
                Motion2::endstopTriggered(act, act->directions & axisBit, false);
            }
        }

I think it should be

      if (Motion1::endstopMode == PROBING) {
           if (ZProbe->update()) { // ignore z endstop here
               Motion2::endstopTriggered(act, Z_AXIS, false);
           }
       }

Explains at least why polling for z probe does not work.

Regarding interrupt - you are right as I understand the lowest vector is the tie braker here. So adding

    NVIC_SetPriority(PIOA_IRQn, 1);
    NVIC_SetPriority(PIOB_IRQn, 1);
    NVIC_SetPriority(PIOC_IRQn, 1);
    NVIC_SetPriority(PIOD_IRQn, 1);

to void HAL::setupTimer() should help here. When called it blocks interrupts

void endstopTriggered(fast8_t axis, bool dir) {
    InterruptProtectedBlock noInt;
    Motion2::endstopTriggered(Motion3::act, axis, dir);
}

so we only need to prevent calling it while in motion3, also I still do not see how this can cause the side effect but it makes sense regarding the frequency it happens.

Would be great if you can recheck. Will also try tomorrow if I can get it with my setup - but different processor alone could already prevent it from happening. Maybe I should test on my felix also it has other z probe type, but same timings.

@kyrreaa
Copy link
Contributor Author

kyrreaa commented Mar 2, 2020

Very cool. I've been avoiding the bed map so far due to this and focusing on the tuning of the retracts etc. Later I will run direct extrusion but I need to design the extruder first...

Here's a clip showing Repetier Firmware (with slight patch) run a real Core-XYZ!
https://youtu.be/B8z4qNUmVlg

@AbsoluteCatalyst
Copy link

AbsoluteCatalyst commented Mar 4, 2020

Hey! Back from a little absence.
Ah... I had this same exact problem a while back ago but I had assumed it may have just been something I broke.
I realize now I should have probably made a github issue about it, apologies.
I don't have a full solution, but I have a workaround from what I last traced down. Hopefully this might help narrowing the bug down or at least echos what's been posted so far.

But one first bug:

I have also noticed that some of the back-moves after probe hit is done at wrong speed and I suspect this is related.

I'm not entirely sure if it's related, though I had this issue as well.

In MotionLevel1.cpp line 784, in the function:
Motion1::moveRelativeBySteps(int32_t coords[NUM_AXES])

buf.length would occasionally already have something it was set to, affecting the planner's speed.
Clearing it to zero just as startSpeed and endSpeed are appears to fix that issue.
buf.startSpeed = buf.endSpeed = buf.length = 0;

As for the main bug,
endstopTriggered seems to be called properly, however the Motion3::act pointer is occasionally null so the function doesn't execute further. It appeared to be a sort of race condition between interrupts but I never got to pin it down fully at the time.

My fix was to attempt to skip to the next act via Motion3::activateNext() if it's a null pointer and we're also probing. A bit dirty but fixed it for the time being.
The vast majority of the time this actually works and obtains the correct next act, and I can usually get a 400 point bed map to work. But in some seemingly super rare occurrences it'll still fail, in which case I kill the printer rather than have a bed crash.

In MotionLevel2.cpp, line 557

void Motion2::endstopTriggered(Motion3Buffer* act, fast8_t axis, bool dir) {
    if (act == nullptr || act->checkEndstops == false) {
        // Moses - Hotfix for failing probe endstop
        // For some reason sometimes the motion3buffer is empty/null when endstopTriggered is called,
        // leading to a failed z probe and then a bed crash.
        if (Motion1::endstopMode == EndstopMode::PROBING) {
            Motion3::activateNext();
            act = Motion3::act;
            //endstop failed hit
            DEBUG_MSG_FAST("ESTF");
            if (act == nullptr) {
                //skipped buffer failed
                DEBUG_MSG_FAST("SKBF");
                HAL::delayMilliseconds(2000);
                Printer::kill(0);
                return;
            } else {
                DEBUG_MSG_FAST("ATT");
            } 
        } else {
            //failed act
            DEBUG_MSG_FAST("FNACT");
            return;
        }
    }
...

Regarding interrupt - you are right as I understand the lowest vector is the tie braker here. So adding...
I've just tried adding this bit and also updating motion 3's priority in HAL, but unfortunately the problem still persists. I wish I would have dug into this a little further when I first caught it.

However:

There is an error in MotionLevel3.cpp line 166

Thanks! er, this fixes another issue I had assumed was just my configuration/HAL modifcations... software Z endstop works correctly now.

And now: using software endstops seems to eliminate the problem, possibly confirming it's interrupt related.

--

A side note: Once the bed probing problem was fixed, another issue I had was that distortion was being applied in "reverse". My printer's bed physically moves downwards whenever I "raise" the Z axis.
I fixed it by changing instances of addDistortion to subDistortion (MotionLevel1.cpp:832, MotionLevel2.cpp:221).
I'm assuming this'll be configurable in the future though?

@kyrreaa
Copy link
Contributor Author

kyrreaa commented Mar 4, 2020

I've just found it odd how we map from probe level to bed with probe. Makes the maps created by Repetier Host look odd and inverted.
I would prefer a positive number to mean ved rises abowe z0 and a negative to be below. Thus probing would not change result based on height it was started from.

@repetier
Copy link
Owner

repetier commented Mar 4, 2020

endstopTriggered

eat post. Regarding buf.length you are 100% right. Not only occasionally it is not zero but it just has the length 16 moves ago and then accumulates to that, so setting to 0 is a must.

Regarding act = 0 is also a good hint. That in deed would lead to the hardware interrupt not triggering end stop. And at the end of each motion3 block you have:
// Test if we are finished
if (--act->stepsRemaining == 0) {
if (act->last) {
Motion2::pop();
}
InterruptProtectedBlock noInts;
--length;
act = nullptr;
}
it would only get set back at the next start of motion3 and as @kyrreaa pointed out the interrupt priority would allow it already to interrupt there, but more likely it happens in the time between the 2 motion3 steps so even improved priority handling (which is still good) would not solve it fully. So in total I'm quite sure we now have the reason it happens.
Correct solution is an adoption of your idea - not kill printer but run activateNext() if length is remaining. Plus we need to ensure interrupt priority is ok.

Just testing it and fixing delta problems as well to make G32 complete, so I can test the distrotion problem. No need to choose - it just should do it correct:-)

@repetier
Copy link
Owner

repetier commented Mar 4, 2020

Host just measures distances from given z and reports them. It is not a correction map. But I agree on sign in firmware correction it would be intuitive to have positive number for above bed. I think currently it is different but I'm as said struggling to get measurement complete. Computation of correct allowed region is somehow not as wanted so last row measures off bed. Special delta problem where not all points are reachable in a circle:-)

@kyrreaa
Copy link
Contributor Author

kyrreaa commented Mar 6, 2020

Yes. Ideally such a probing would just be a circular or spiral pattern on a delta.
This ofcource makes the math a bit more complex but in reality it should be possible to find 3 points that create a vertex which a current coordinate is within and interpolate between the 3 values for those points. If value is close enough to an edge between two points the math is easier.

We have noticed the correction done in development for the probing as it seems to improve things a lot. But still something is off with the extrapolated corners.

Looking forward to testing the new tweaks to see if I can atleast circumvent the probing issue. Machine repeatability seems really good so far and I need the probing to determine where the machine has deviations vs where the bed is actually warped. Will be probing a granite straight-slab. That should show what is real... (Nice to have a machine you can actually put a 2' granite slab in btw.

@repetier
Copy link
Owner

repetier commented Mar 8, 2020

Ok, I think I have sorted out the problems. Only thing is with my delta I would never get the z-homing error since the update happens inside motion3 timer in that case due to delta special problems. So would be great if you can confirm that it is really working now.

I also improved (i hope) the extrapolation of unknown points.

Swapped distortion correction direction. Positive values are upwards bumps in bed.

I also added M48 to check z probe repeatablility. Would be great if someone could test that as well and port the result. Somehow I get with my 3d touch a very worse repeatability with difference > 0.2mm which is not good. Since that is a new sensor I don't know if I missposition all the time or if the sensor is that bad.

@AbsoluteCatalyst
Copy link

AbsoluteCatalyst commented Mar 9, 2020

Merged/updated to the new changes. Thanks for the M48 function. Two quick bugs I spotted:

M48 returns NAN if the bump distortion is disabled via EEPROM since Leveling::distortionAt returns weird/NAN values. In ZProbeHandler::runProbe() line 265, uncommenting the isDistortionEnabled check fixes it.

In the Motion2::endstopTriggered bug fix: the buffer act pointer wasn't being updated after activateNext() so it'd firmware crash. After adding that line, no issues spotted so far!

After running M48 I got:
Average: 4.9871
Min: 4.9738
Max: 4.9900
Std. deviation: 0.0052
(used 10 probes)

Running it a few times in different spots and the deviation hovers around the same (0.0008+/-).
I'm using a BLTouch-Smart v3.1, so that seems to match up with what they say it should.

>0.2mm deviation seems crazy. Is that a clone? I remember reading that clones had poor accuracy compared, but never realized they were that bad...

Unfortunately I don't own a delta printer so I can't test any delta related changes. :(

@repetier
Copy link
Owner

repetier commented Mar 9, 2020

Thanks for testing. Have updated github with fixes. Did remove the distortion adding in zprobe. I think it is better to not include it there. Before my update it was even forbidden, but now it just gives a warning. Problem is that the result depends on your starting height which might already compensate that error fully or partially or not at all.

Your M48 look like what I'd expect. I've seen a comparison between bltouch and 3dtouch where bltouch was better but just maybe factor 2. I currently are way off. Unfortunately the bltouch I bought is kinda defect. On a head brush the pin was bent and now it does not reliable pull/push the pin any more.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants