Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

USB methods disable interrupts for too long #397

Open
ianrrees opened this issue Feb 1, 2021 · 20 comments
Open

USB methods disable interrupts for too long #397

ianrrees opened this issue Feb 1, 2021 · 20 comments

Comments

@ianrrees
Copy link
Contributor

ianrrees commented Feb 1, 2021

I'm working on a SAMD21 based USB device, which does the USB work in the USB ISR, along the lines of the usb_echo example. The USB interrupt is set to a lower priority (higher number) than another interrupt which needs to be serviced fairly quickly, but sometimes that doesn't happen quickly enough. The issue seems to be that UsbBus's methods do all their work in interrupt free contexts. For example, these.

Might it make sense for UsbBus to have an "in use" flag in the mutex rather than the RefCell, so that the interrupts only need to be disabled for setting/checking that flag?

@twitchyliquid64
Copy link
Contributor

Can you elaborate on 'too long'? We tried to do as little as possible in the UsbBus inner implementation.

The USB interrupt is set to a lower priority (higher number) than another interrupt which needs to be serviced fairly quickly

Depending on how many priority bits your chip has, could you boop the USB to have less priority than your other CSR?

Might it make sense for UsbBus to have an "in use" flag in the mutex rather than the RefCell, so that the interrupts only need to be disabled for setting/checking that flag?

I'm not sure i follow, can you give me an example?

@ianrrees
Copy link
Contributor Author

ianrrees commented Feb 2, 2021

Can you elaborate on 'too long'?

Maybe more importantly than the specific timing constraint, I was surprised to find that the entirety of read() and write() is in an interrupt free block since they call copy_to_nonoverlapping(). Isochronous endpoints can have up to 1023B per read/write, so there would be a pretty wide range of time that interrupts will be disabled.

Specifically, I've got about a 16us (ed: at 48MHz) window to respond to a DMA interrupt, and I believe that response usually takes about 3us so I should be able to tolerate about 12us of interrupts being disabled. Admittedly I haven't put much effort in to profiling - an early test showed a case where the DMA interrupt wasn't handled when the processor was in the USB ISR, so I went digging to see how that could happen...

could you boop the USB to have less priority than your other CSR

Yep, I'm pretty sure that is the current arrangement, but perhaps got something wrong. Will investigate further.

I'm not sure i follow, can you give me an example?

Sorry, what I wrote was a bit unclear, and thinking about it more I may be misunderstanding the point of the mutex. Will try to catch you on matrix to discuss, but for completeness I was imagining adding an AtomicBool field to UsbBus to indicate that work is being done on the RefCell contents. It might look something like below, where swap is along these lines:

fn write(&self, ep: EndpointAddress, buf: &[u8]) -> UsbResult<usize> {
    if swap(self.busy_flag, true) {
        // self.inner was already in use
        Err(ComputerSaysNo)

    } else {
        // We have atomically set busy_flag, so are the only user of self.inner

        // write() proceeds with interrupts enabled
        let result = self.inner.borrow().write(ep, buf);

        // release the flag
        swap(self.busy_flag, false);

        result
    }
}

@twitchyliquid64
Copy link
Contributor

Ooooh I have an idea. Instead of turning off interrupts globally, what if we use mask and unmask to just turn off the IRQ lines for USB (USB_OTHER, USB_TRCPT0, and USB_TRCPT1on samd5x, just one on d21 iirc).

That way we can be happily preempted by more urgent interrupts while servicing, as long as they dont take too long that we miss a frame.

@twitchyliquid64
Copy link
Contributor

Something like this maybe:

struct UsbCS {
    other_active: bool, // USB_OTHER irq
    trcpt0_active: bool, // USB_TRCPT0 irq
    trcpt1_active: bool, // USB_TRCPT1 irq
}

impl core::ops::Drop for UsbCS {
    fn drop(&mut self) {
        use cortex_m::peripheral::NVIC;
        use crate::target_device::interrupt;
        unsafe {
            if self.other_active {
                NVIC::unmask(interrupt::USB_OTHER);
            }
            if self.trcpt0_active {
                NVIC::unmask(interrupt::USB_TRCPT0);
            }
            if self.trcpt1_active {
                NVIC::unmask(interrupt::USB_TRCPT1);
            }
        }
    }
}

impl UsbCS {
    fn mutex_cs(&self) -> cortex_m::interrupt::CriticalSection {
        unsafe { cortex_m::interrupt::CriticalSection::new() }
    }
}

/// Disables IRQ lines related to USB, returning a token that resets the
/// masking state when dropped.
fn usb_critical_section() -> UsbCS {
    use cortex_m::peripheral::NVIC;
    use crate::target_device::interrupt;

    let cs = UsbCS {
        other_active: NVIC::is_enabled(interrupt::USB_OTHER),
        trcpt0_active: NVIC::is_enabled(interrupt::USB_TRCPT0),
        trcpt1_active: NVIC::is_enabled(interrupt::USB_TRCPT1),
    };

    NVIC::mask(interrupt::USB_OTHER);
    NVIC::mask(interrupt::USB_TRCPT0);
    NVIC::mask(interrupt::USB_TRCPT1);
    cs
}

@ianrrees
Copy link
Contributor Author

ianrrees commented Feb 2, 2021

That way we can be happily preempted by more urgent interrupts while servicing, as long as they dont take too long that we miss a frame.

Is this the reason we're currently disabling interrupts? If so, it probably hasn't been working well in uses like this

#[interrupt]
fn USB() {
    usb_bus.poll(&mut [a_usb_device]);

    // this is a window for other interrupts

    a_usb_device.do_some_reads_and_writes();  // likely contains windows for other interrupts
}

I guess we'll want a method along the lines of interrupt_free() that uses a usb_critical_section, maybe that should be in UsbBus?

#[interrupt]
fn USB() {
    usb_bus.usb_interrupt_free(|cs| {
        // Would poll(), read(), write(), etc need a reference to cs, the UsbCS?
        usb_bus.poll(&mut [a_usb_device]);
        a_usb_device.do_some_reads_and_writes();
    });
}

@twitchyliquid64
Copy link
Contributor

Is this the reason we're currently disabling interrupts?

I think it was just to prevent us being interrupted by a different USB interrupt while servicing a USB interrupt.
I don't think there is anything wrong with being preempted by a higher priority interrupt as long as its not USB-related, but theres only one way to find out xD

@ianrrees
Copy link
Contributor Author

ianrrees commented Feb 4, 2021

That way we can be happily preempted by more urgent interrupts while servicing, as long as they dont take too long that we miss a frame.

Thinking about this more - I don't know if it's really what is needed. Imagine there's a GPIO interrupt that triggers a USB write - we wouldn't want it to stomp on an in-progress USB write...

I've just yanked out the interrupt::free(), Mutex, and RefCell - it seems to work this state on SAMD21 (which only has one USB interrupt) without doing any masking, so gets around my immediate problem. This is obviously sketchy though, there still needs to be a mechanism to only allow one "thread" to access the UsbBus at a time.

Since the methods where we had interrupt::free() are implementing usb-device, I guess we can't make them require a CriticalSection. In case of contention, should the second call to a UsbBus method block, or return an error - maybe InvalidState?

@twitchyliquid64
Copy link
Contributor

I've just yanked out the interrupt::free(), Mutex, and RefCell - it seems to work this state on SAMD21 (which only has one USB interrupt) without doing any masking, so gets around my immediate problem

Can you share the diff somewhere? Im curious how you did this

@ianrrees
Copy link
Contributor Author

ianrrees commented Feb 4, 2021

Sure, ianrrees@b129876 and its parent. It's not pretty :).

(edit: updated link)

@ianrrees
Copy link
Contributor Author

ianrrees commented Feb 5, 2021

seems to work ... without doing any masking

This was premature, it definitely needs a mechanism to prevent USB interrupts while already processing one.

@ianrrees
Copy link
Contributor Author

ianrrees commented Feb 8, 2021

I think there's a fundamental issue here, but feel a bit out of my depth.

Imagine that we have an implementation of mutex that allows work on the inner struct without disabling interrupts for the duration of that work. When a method tries to access the inner struct through that mutex, the mutex will either have to block or return an error. Since we don't have an OS, the blocking option doesn't make sense (the interrupted code that holds the lock won't make progress until the interrupting code is done), so the locking attempt must return an error. But, we don't have a way to get that error out of several methods required by usb_device that need exclusive access to the inner struct, for instance reset().

A secondary question is how to handle an interrupt that fails to get the lock.

@ianrrees
Copy link
Contributor Author

Sorry for my lack of follow-through on this, the above hack is working well enough in the meantime ;).

This was premature, it definitely needs a mechanism to prevent USB interrupts while already processing one.

Can't remember what prompted me to write this, but think it might be incorrect; AFAICT any particular ISR on SAMD21 won't ever interrupt itself.

I'm becoming convinced that although the mutex in the UsbBus implementation may be necessary in the current implementation of usb-device, with some hopefully-straightforward changes to usb-device (see rust-embedded-community/usb-device#9) it won't be. Currently, a typical use-case involves multiple structs that refer to the same UsbBus, but if we instead had one struct that owned all the USB stuff, then Rust's ownership guarantees should be sufficient to ensure that only one execution context can access it at a time.

There might still be a mutex required to mediate access to the USB stuff, but it would be easier to implement one that only blocks interrupts for a few instructions, for example.

@ianrrees
Copy link
Contributor Author

dirbaio on Matrix suggested that the stm32-usbd doesn't have this particular issue, perhaps it's possible to make UsbBus and/or Inner not by Sync, so that the user either picks their own mutex or only accesses the USB from one particular context.

@bradleyharden
Copy link
Contributor

@ianrrees, did you settle on a solution here? Does this need a change to the HAL or a change upstream?

@ianrrees
Copy link
Contributor Author

Currently I'm using a branch that removes the mutex/critical section (CS), it works but isn't ideal. Some possible ways we could resolve this, all involve changes to the HAL:

  1. usb-device changes along the lines of point 2 at Figure out a better way to keep track of the list of classes rust-embedded-community/usb-device#9 , which I think would then mean the HAL implementation wouldn't need the CS to begin with.
  2. It might be possible to push the CS down closer to the hardware, so that the external API stays the same but there are only a couple instructions at a time in a CS. From memory, this would be a fairly involved refactoring.
  3. Make the mutex implementation configurable via feature flag, and provide one that either takes the mutex or panics. I've found it's really not difficult to only access the USB methods from a USB interrupt context, and would expect that a bad access pattern with a lock/panic mutex would result in a panic pretty quickly. So, this isn't really the cleanest solution, but it might be a good enough "pressure relief valve" for situations where people need to use USB along with lower-latency interrupts.

@TDHolmes
Copy link
Contributor

TDHolmes commented Jun 15, 2021

for something along the lines of 3, you can check if you're in interrupt context with this cortex-m function and panic if you're in thread context

@Sympatron
Copy link
Contributor

I think option 3 is the best at the moment. Should be fairly simple and that way you either access USB methods from different context safely or get maximum performance at the cost of panics if it's used from multiple contexts (which I think wouldn't be too difficult to avoid). Maybe mutexes should be opt-out to make the safer option the default.

Another option would be to let the user provide a mutex with the current one as the default.

@TDHolmes
Copy link
Contributor

TDHolmes commented Apr 1, 2022

@ianrrees is there any update needed to this issue as a result of the recent USB PRs? Or is it the same story

@ianrrees ianrrees changed the title USB methods disable interrupts for too long(?) USB methods disable interrupts for too long Apr 1, 2022
@ianrrees
Copy link
Contributor Author

ianrrees commented Apr 1, 2022

Recent changes are unrelated, I was thinking about picking this back up though...

Any opposition to approach 3 from a few comments back?

@TDHolmes
Copy link
Contributor

TDHolmes commented Apr 1, 2022

Nope, I think it makes good sense

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants