Improved documentation

arjanmels · arjanmels · commit 09cdca2e0145 · 2020-05-08T11:54:22.000+02:00
diff --git a/.vscode/tasks.json b/.vscode/tasks.json
@@ -62,7 +62,7 @@
         {
             "label": "Build Documentation",
             "type": "shell",
-            "command": "cargo xdoc --open",
+            "command": "cargo xdoc --open --all-features",
             "problemMatcher": {
                 "base": "$rustc",
                 "fileLocation": [
diff --git a/src/alloc.rs b/src/alloc.rs
@@ -5,6 +5,18 @@
 //! The allocators can be safely used in a mixed fashion. (Including multiple GeneralAllocators
 //! with different thresholds)
 //!
+//! **NOTE: the default implementations of memcpy, memset etc. which are used behind the scenes use
+//! unaligned accesses.** This causes exceptions when used together with IRAM.
+//! The replacements in the mem module do handle alignment properly. They can be enable by
+//! including the following in Cargo.toml:
+//! ```
+//! [package.metadata.cargo-xbuild]
+//! memcpy = false
+//! [features]
+//! mem=[]
+//! ```
+//!
+//!
 //! # TODO:
 //! - Improve underlying heap allocator: support for realloc, speed etc.
 //!
diff --git a/src/mem.rs b/src/mem.rs
@@ -2,20 +2,29 @@
 //!
 //! These are normally part of the compiler-builtins crate. However the default routines
 //! do not use word sized aligned instructions, which is slow and moreover leads to crashes
-//! when using IRAM (which only allows aligned accesses).
+//! when using memories/processors which only allows aligned accesses.
 //!
 //! Implementation is optimized for large blocks of data. Assumption is that for small data,
-//! they are inlined by the compiler. Some optimization done for often small sizes as
-//! otherwise lot of slowdown in debug mode.
+//! they are inlined by the compiler. Some optimization done for often used small sizes as
+//! otherwise significant slowdown in debug mode.
 //!
 //! Implementation is optimized when dst/s1 and src/s2 have the same alignment.
 //! If alignment of s1 and s2 is unequal, then either s1 or s2 accesses are not aligned
 //! resulting in slower performance. (If s1 or s2 is aligned, then those accesses are aligned.)
 //!
-//! Further optimization is possible by having dedicated code path for unaligned accesses,
+//! Further optimization is possible by having a dedicated code path for unaligned accesses,
 //! which uses 2*PTR_SIZE to PTR_SIZE shift operation (i.e. llvm.fshr);
-//! but implementation of this intrinsic is not well optimized on all platforms.
+//! but implementation of this intrinsic is not yet optimized and currently leads to worst results.
 //!
+//! Also loop unrolling in the memcpy_reverse function is not fully optimal due to limited current
+//! llvm optimization: uses add with negative offset + store, instead of store with positive
+//! offset; so 3 instructions per loop instead of 2
+//!
+//! A further future optimization possibility is using zero overhead loop, but again
+//! currently not yet supported by llvm for xtensa.
+//!  
+//! For large aligned memset and memcpy reaches ~88% of maximum memory bandwidth;
+//! for memcpy_reverse ~60%.
 #[allow(warnings)]
 #[cfg(target_pointer_width = "64")]
 type c_int = u64;

Original file line number	Diff line number	Diff line change
`@@ -62,7 +62,7 @@`
`62`	`62`	`{`
`63`	`63`	`"label": "Build Documentation",`
`64`	`64`	`"type": "shell",`
`65`		`- "command": "cargo xdoc --open",`
	`65`	`+ "command": "cargo xdoc --open --all-features",`
`66`	`66`	`"problemMatcher": {`
`67`	`67`	`"base": "$rustc",`
`68`	`68`	`"fileLocation": [`