Skip to content

Commit 09cdca2

Browse files
committed
Improved documentation
1 parent 9898f98 commit 09cdca2

File tree

3 files changed

+27
-6
lines changed

3 files changed

+27
-6
lines changed

.vscode/tasks.json

+1-1
Original file line numberDiff line numberDiff line change
@@ -62,7 +62,7 @@
6262
{
6363
"label": "Build Documentation",
6464
"type": "shell",
65-
"command": "cargo xdoc --open",
65+
"command": "cargo xdoc --open --all-features",
6666
"problemMatcher": {
6767
"base": "$rustc",
6868
"fileLocation": [

src/alloc.rs

+12
Original file line numberDiff line numberDiff line change
@@ -5,6 +5,18 @@
55
//! The allocators can be safely used in a mixed fashion. (Including multiple GeneralAllocators
66
//! with different thresholds)
77
//!
8+
//! **NOTE: the default implementations of memcpy, memset etc. which are used behind the scenes use
9+
//! unaligned accesses.** This causes exceptions when used together with IRAM.
10+
//! The replacements in the mem module do handle alignment properly. They can be enable by
11+
//! including the following in Cargo.toml:
12+
//! ```
13+
//! [package.metadata.cargo-xbuild]
14+
//! memcpy = false
15+
//! [features]
16+
//! mem=[]
17+
//! ```
18+
//!
19+
//!
820
//! # TODO:
921
//! - Improve underlying heap allocator: support for realloc, speed etc.
1022
//!

src/mem.rs

+14-5
Original file line numberDiff line numberDiff line change
@@ -2,20 +2,29 @@
22
//!
33
//! These are normally part of the compiler-builtins crate. However the default routines
44
//! do not use word sized aligned instructions, which is slow and moreover leads to crashes
5-
//! when using IRAM (which only allows aligned accesses).
5+
//! when using memories/processors which only allows aligned accesses.
66
//!
77
//! Implementation is optimized for large blocks of data. Assumption is that for small data,
8-
//! they are inlined by the compiler. Some optimization done for often small sizes as
9-
//! otherwise lot of slowdown in debug mode.
8+
//! they are inlined by the compiler. Some optimization done for often used small sizes as
9+
//! otherwise significant slowdown in debug mode.
1010
//!
1111
//! Implementation is optimized when dst/s1 and src/s2 have the same alignment.
1212
//! If alignment of s1 and s2 is unequal, then either s1 or s2 accesses are not aligned
1313
//! resulting in slower performance. (If s1 or s2 is aligned, then those accesses are aligned.)
1414
//!
15-
//! Further optimization is possible by having dedicated code path for unaligned accesses,
15+
//! Further optimization is possible by having a dedicated code path for unaligned accesses,
1616
//! which uses 2*PTR_SIZE to PTR_SIZE shift operation (i.e. llvm.fshr);
17-
//! but implementation of this intrinsic is not well optimized on all platforms.
17+
//! but implementation of this intrinsic is not yet optimized and currently leads to worst results.
1818
//!
19+
//! Also loop unrolling in the memcpy_reverse function is not fully optimal due to limited current
20+
//! llvm optimization: uses add with negative offset + store, instead of store with positive
21+
//! offset; so 3 instructions per loop instead of 2
22+
//!
23+
//! A further future optimization possibility is using zero overhead loop, but again
24+
//! currently not yet supported by llvm for xtensa.
25+
//!
26+
//! For large aligned memset and memcpy reaches ~88% of maximum memory bandwidth;
27+
//! for memcpy_reverse ~60%.
1928
#[allow(warnings)]
2029
#[cfg(target_pointer_width = "64")]
2130
type c_int = u64;

0 commit comments

Comments
 (0)