-
Notifications
You must be signed in to change notification settings - Fork 49
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
no-std support #124
base: master
Are you sure you want to change the base?
no-std support #124
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice work! Would it be possible to add some documentation on the no_std
feature in the README
and perhaps a regression test? Not sure if doctests lets your specify the features, but something like that would be amazing!
src/array_utils.rs
Outdated
@@ -1,6 +1,8 @@ | |||
use crate::Complex; | |||
use crate::FftNum; | |||
use std::ops::{Deref, DerefMut}; | |||
use core::ops::{Deref, DerefMut}; | |||
#[allow(unused_imports)] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is this necessary? Can't you feature gate the import?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, my bad. Fixed in a6f4c2e.
Cargo.toml
Outdated
avx = ["std"] | ||
sse = ["std"] | ||
neon = ["std"] | ||
wasm_simd = ["std"] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there no way to get these on no_std
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So I gave a bash at actually using this crate from a
And note local version of Once you weed out all the I eventually got something that seems like it's building correctly on my machine. I'm not really very familiar with #![feature(
lang_items,
start,
core_intrinsics,
rustc_private,
panic_unwind,
rustc_attrs
)]
#![allow(internal_features)]
#![no_std]
extern crate unwind;
use core::alloc::Layout;
use core::cell::UnsafeCell;
use core::panic::PanicInfo;
use core::ptr::null_mut;
use core::sync::atomic::AtomicUsize;
use core::sync::atomic::Ordering::SeqCst;
use core::{alloc::GlobalAlloc, intrinsics};
use rustfft::{num_complex::Complex, FftPlanner};
const ARENA_SIZE: usize = 128 * 1024;
const MAX_SUPPORTED_ALIGN: usize = 4096;
#[repr(C, align(4096))] // 4096 == MAX_SUPPORTED_ALIGN
struct SimpleAllocator {
arena: UnsafeCell<[u8; ARENA_SIZE]>,
remaining: AtomicUsize, // we allocate from the top, counting down
}
#[global_allocator]
static ALLOCATOR: SimpleAllocator = SimpleAllocator {
arena: UnsafeCell::new([0x55; ARENA_SIZE]),
remaining: AtomicUsize::new(ARENA_SIZE),
};
unsafe impl Sync for SimpleAllocator {}
unsafe impl GlobalAlloc for SimpleAllocator {
unsafe fn alloc(&self, layout: Layout) -> *mut u8 {
let size = layout.size();
let align = layout.align();
// `Layout` contract forbids making a `Layout` with align=0, or align not power of 2.
// So we can safely use a mask to ensure alignment without worrying about UB.
let align_mask_to_round_down = !(align - 1);
if align > MAX_SUPPORTED_ALIGN {
return null_mut();
}
let mut allocated = 0;
if self
.remaining
.fetch_update(SeqCst, SeqCst, |mut remaining| {
if size > remaining {
return None;
}
remaining -= size;
remaining &= align_mask_to_round_down;
allocated = remaining;
Some(remaining)
})
.is_err()
{
return null_mut();
};
self.arena.get().cast::<u8>().add(allocated)
}
unsafe fn dealloc(&self, _ptr: *mut u8, _layout: Layout) {}
}
#[start]
fn main(_argc: isize, _argv: *const *const u8) -> isize {
let mut planner = FftPlanner::new();
let fft = planner.plan_fft_forward(1234);
let mut buffer = [Complex {
re: 0.0f32,
im: 0.0f32,
}; 1234];
fft.process(&mut buffer);
0
}
#[lang = "eh_personality"]
fn rust_eh_personality() {}
#[panic_handler]
fn panic_handler(_info: &PanicInfo) -> ! {
intrinsics::abort()
} So to actually get
|
Thanks a lot! The crate now compiles on |
Nice work! :) Think your
I think you need to ensure either |
README.md
Outdated
### std | ||
|
||
The scalar implementation (no avx, sse, neon or wasm SIMD) is available for no-std targets. To build the crate as no-std simply disable default fetures. | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Currently you also need to enable libm
feature. Would be cool if default-features = false
implies libm
or something. Same with primal-check
crate. I'm not really sure what the generally acceptable thing to do is in Rust but if it's not possible to express "either std
or libm
" needs to be enabled, then you should add documentation on enabling libm
here.
My primal-check fix was just meant to get the crate to compile on no-std, so yeah it's not great 🙃. I have an example working a stm32f411 nucleo board. The library is really big though, it takes up 220KiB of flash (which takes several seconds to upload to the chip) and computing an FFT of size 1234 needs 48KiB of heap. EDIT: Example is now in the repo. Constructing an FFT dynamically might not be the best approach on embedded. It would be much better to construct the FFTs at compile time and only include the relevant algorithms in the binary (though this would make it impossible to construct dynamically-sized FFTs, so there would need to be a way to do both). Giving the option to store the twiddles in flash rather than sram would be nice, too. Statically-sized FFTs would also be a nice feature for std applications, as this would likely reduce binary size and could be faster by not requiring allocations. I'm not sure how to implement this, though. The use of nightly features would probably be required, complicated const functions are hard to do in stable rust in my experience. |
Another issue I found when compiling for |
This PR replaces all std imports in the scalar implementation with analogues from core and alloc and makes std an optional feature enabled by default.
HashMap implementation is taken from hashbrown (which std uses internally).
Relevant issues: #116, #122.
I ran the benchmarks overnight and although the full suite takes longer to perform (on my x86-64 Linux machine: median 869 s vs 767 s before), the individual test results are somewhat more balanced (68 tests run more than 20% slower and 67 tests more than 20% faster).
Here are the test results. The first number is the median time taken by the new implementation divided by the median time taken by the old implementation. (Each version was run 15 times).
I am not sure where the performance difference comes from.