Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Enhancement] SIMD Extension for CPUTensor #86

Merged
merged 39 commits into from
Aug 18, 2023
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
39 commits
Select commit Hold shift + click to select a range
cac8993
Merge pull request #83 from hikettei/cpu-simd-backend
hikettei Aug 14, 2023
278b446
cl-waffe2-simd is born out
hikettei Aug 14, 2023
f55ad55
project structure for cl-waffe2 simd
hikettei Aug 14, 2023
402b0df
[Add] baseline for arithmetic ops
hikettei Aug 14, 2023
f9a460b
[Add] integer arithmetic ops
hikettei Aug 14, 2023
5c3d6df
writing headers
hikettei Aug 14, 2023
747066e
[Add] Copy
hikettei Aug 14, 2023
c3edb56
[Add] max/min
hikettei Aug 14, 2023
d806d06
[Add] max/min operators
hikettei Aug 14, 2023
ac5d746
[Add] comparisons
hikettei Aug 17, 2023
b8e5f88
comparisons
hikettei Aug 17, 2023
5a04552
[Add] All Tensor and Tensor Comparisons
hikettei Aug 17, 2023
11cd576
[Add] Scalar And Tensor Comparisons
hikettei Aug 17, 2023
fd63c3c
[Add] Included compiling cl-waffe2 simd extension options to makefile
hikettei Aug 17, 2023
0d076f7
[Add] loading shared library
hikettei Aug 17, 2023
06aa6aa
[Add] API Wrappers
hikettei Aug 17, 2023
1d21bb9
[Add] loads simd extension and calls add
hikettei Aug 17, 2023
4b6dc71
[Add] Arithmetic+Copy with SIMD
hikettei Aug 17, 2023
cdc4d49
[Add] InverseTensorNode with SIMD
hikettei Aug 17, 2023
20eed53
[Fix] won't work with stride=0
hikettei Aug 17, 2023
339fe56
[Fix] increase offsets
hikettei Aug 17, 2023
afd487f
[Fix] the result couldn't reduced properly if out tensor is broadcasted
hikettei Aug 17, 2023
b43cc37
[Add] SIMD impls: MaxValue MinValue Nodes
hikettei Aug 17, 2023
10ed17b
[Add] Logical Comparison with SIMD
hikettei Aug 18, 2023
b76ff04
[Fix] eval-when
hikettei Aug 18, 2023
ce7797b
[Fix] Max/Min Strides weren't considered well
hikettei Aug 18, 2023
b606ff9
[Add] delete_simd_extension
hikettei Aug 18, 2023
7b4f5f5
[Add] All comparison operation tests for CPU/Lisp Backend
hikettei Aug 18, 2023
b507713
[Fix] invaild casting: xxx -> double-float with broadcasting
hikettei Aug 18, 2023
98b29a9
[Document] Added sections for SIMD Extension, show-backends
hikettei Aug 18, 2023
16d6163
[Add] download_assets
hikettei Aug 18, 2023
73c5f92
[Document] Added a section for Adam
hikettei Aug 18, 2023
c651020
[Add] Current supporting status
hikettei Aug 18, 2023
92142bf
Merge pull request #84 from hikettei/cpu-simd-backend
hikettei Aug 18, 2023
205ae5e
[Refactor] Moved the definition of Im2ColNode/Col2ImNode, from nn pac…
hikettei Aug 18, 2023
0881dfd
Merge pull request #85 from hikettei/unfold-spec
hikettei Aug 18, 2023
7a33168
Merge branch 'master' into develop
hikettei Aug 18, 2023
c34ac09
[Update] docs
hikettei Aug 18, 2023
366aabc
[Fix] merge conflicts
hikettei Aug 18, 2023
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
28 changes: 24 additions & 4 deletions GNUmakefile
Original file line number Diff line number Diff line change
Expand Up @@ -6,6 +6,8 @@ MKTEMP := mktemp
RLWRAP := rlwrap
LOGFILE := $(shell mktemp)

PYTHON := "python"
GCC := "gcc"
.DEFAULT_GOAL := help

# This code taken from https://marmelab.com/blog/2016/02/29/auto-documented-makefile.html
Expand All @@ -16,19 +18,19 @@ help:
awk 'BEGIN {FS = ":.*?## "}; {printf "\033[36m%-20s\033[0m %s\n", $$1, $$2}'

.PHONY: compile
compile: ## Compile whole project
compile: ## Compiles the whole project
$(SBCL) $(SBCL_OPTIONS) $(QUICKLOAD_WAFFE2) \
--eval '(asdf:compile-system :cl-waffe2)' \
--quit

.PHONY: test
test: ## Run test harness
test: ## Running a test harness
$(SBCL) $(SBCL_OPTIONS) $(QUICKLOAD_WAFFE2) \
--eval '(asdf:test-system :cl-waffe2)' \
--quit

.PHONY: recordtest
recordtest: ## Run test harness with logging
recordtest: ## Running a test harness with recording logs
$(warning This session will be recorded in $(LOGFILE))
$(RLWRAP) --logfile $(LOGFILE) \
$(SBCL) $(SBCL_OPTIONS) $(QUICKLOAD_WAFFE2) \
Expand All @@ -37,7 +39,7 @@ recordtest: ## Run test harness with logging
@printf 'This session has been recorded in %s\n' $(LOGFILE)

.PHONY: repl
repl: ## Launch REPL
repl: ## Launch REPL with loading cl-waffe2
$(SBCL) $(SBCL_OPTIONS) $(QUICKLOAD_WAFFE2)

.PHONY: rlrepl
Expand Down Expand Up @@ -70,6 +72,10 @@ docs: ## Generate documents
--eval '(cl-waffe2.docs:generate)' \
--quit

.PHONY: mkdocs-serve
mkdocs-serve: ## Launchs the documentation server.
cd ./docs/cl-waffe2-docs && mkdocs serve

.PHONY: rt
rt: recordtest ## Alias for recordtest

Expand Down Expand Up @@ -107,3 +113,17 @@ add_to_init_file: ## Enable Quicklisp autoloading
--non-interactive \
--load ~/quicklisp/setup.lisp \
--eval '(ql-util:without-prompting (ql:add-to-init-file))'

.PHONY: build_simd_extension
build_simd_extension: ## Installs SIMD Extension shared library for the CPUTensor backend.
$(GCC) -O3 -march=native -shared -o \
./source/backends/cpu/cl-waffe2-simd/kernels/cl-waffe2-simd.so \
-fpic ./source/backends/cpu/cl-waffe2-simd/kernels/cl-waffe2-simd.c -lm

.PHONY: delete_simd_extension
delete_simd_extension: ## Deletes Compiled SIMD Extension shared library so that CPUTensor works under OpenBLAS
rm -rf ./source/backends/cpu/cl-waffe2-simd/kernels/cl-waffe2-simd.so

.PHONY: download_assets
download_assets: ## Downloads training data sample codes use.
cd ./examples/mnist && $(PYTHON) train_data.py
31 changes: 27 additions & 4 deletions cl-waffe2.asd
Original file line number Diff line number Diff line change
@@ -1,15 +1,33 @@

(in-package :cl-user)


(defpackage :cl-waffe2-simd-asd
(:use :cl :asdf :uiop))

(in-package :cl-waffe2-simd-asd)

(defsystem :cl-waffe2/simd-extension
:author "hikettei"
:licence "MIT"
:description "Utils for SIMD-Enabled Extension, CPUTensor."
:pathname "source/backends/cpu/cl-waffe2-simd"
:serial t
:depends-on (:cffi)
:components ((:file "package")
(:file "shared-object")
(:file "api")))


(defpackage :cl-waffe2-asd
(:use :cl :asdf :uiop))

(in-package :cl-waffe2-asd)

(defsystem :cl-waffe2
:author "hikettei"
:author "hikettei <[email protected]>"
:licence "MIT"
:description "Deep Learning Framework"
:description "Programmable Deep Learning Framework for Common Lisp"
:pathname "source"
:serial t
:depends-on (:cl-ppcre
Expand All @@ -21,7 +39,8 @@
:bordeaux-threads
:closer-mop
:optima
:trivial-garbage)
:trivial-garbage
:cl-waffe2/simd-extension)
;; TODO: Use components and split dependencies.
:components ((:file "threads")
(:file "vm/generic-tensor/package")
Expand Down Expand Up @@ -76,6 +95,7 @@
(:file "base-impl/transform")
(:file "base-impl/ir")
(:file "base-impl/reshapers")
(:file "base-impl/unfold")


(:file "vm/ir")
Expand All @@ -99,6 +119,8 @@
(:file "backends/cpu/blas-functions")
(:file "backends/cpu/arithmetic")
(:file "backends/cpu/matrix-ops")
(:file "backends/cpu/logical")


(:file "distributions/package")
(:file "distributions/generic")
Expand Down Expand Up @@ -160,6 +182,8 @@
(:file "viz/package")
(:file "viz/ast")
(:file "cl-waffe2-repl")



)
:in-order-to ((test-op (test-op cl-waffe2/test))))
Expand Down Expand Up @@ -268,4 +292,3 @@
(:file "model")
(:file "reverse")
(:file "profile")))

3 changes: 2 additions & 1 deletion docs/apis/nn.lisp
Original file line number Diff line number Diff line change
Expand Up @@ -33,5 +33,6 @@
"(Conv2D 3 5 '(3 3))"))

(with-nn-doc (find-class 'MaxPool2D) 't)
(with-nn-doc (find-class 'AvgPool2D) 't)))
(with-nn-doc (find-class 'AvgPool2D) 't)
(with-nn-doc 'unfold 'function)))

1 change: 1 addition & 0 deletions docs/apis/optimizers.lisp
Original file line number Diff line number Diff line change
Expand Up @@ -10,5 +10,6 @@
(with-op-doc (find-class 'AbstractOptimizer) 't)
(with-op-doc (macro-function 'defoptimizer) 'function)
(with-op-doc (find-class 'SGD) 't)
(with-op-doc (find-class 'Adam) 't)
))

16 changes: 15 additions & 1 deletion docs/apis/reference.lisp
Original file line number Diff line number Diff line change
Expand Up @@ -106,6 +106,9 @@

(nodedoc Where-Operation-Node)
(nodedoc Compare-Operation-Node)

(nodedoc Im2ColNode)
(nodedoc Col2ImNode)
))


Expand Down Expand Up @@ -256,7 +259,12 @@ set-input:

predict:
describe ..
```"))))
```"))

(with-op-doc #'show-backends 'function)
(with-op-doc #'set-devices-toplevel 'function)

))

(with-page *lisp-tensor-backend* "[package] :cl-waffe2/backends.lisp"
(insert
Expand All @@ -274,6 +282,12 @@ It is recommended that `LispTensor` are installed in the lowest priority of `*us
(with-page *cpu-tensor-backend* "[package] :cl-waffe2/backends.cpu"
(insert
"The package `:cl-waffe2/backends.cpu` provides an AbstractTensor `CPUTensor` where most of its implementation relies on foreign libraries (e.g.: OpenBLAS, oneDNN in the coming future).")

(insert "
## Enabling the SIMD Extension

For some instructions (e.g.: `!max` `!min`, sparse matrix supports, `SLEEF`, etc...), packages that provide SIMD-enabled CPUTensor implementations are not enabled by default as a design. To enable it, run `make build_simd_extension` in the same directory as cl-waffe2.asd. You can check that it is loaded properly with the `(show-backends)` function.
")

(macrolet ((with-op-doc (name type &body body)
`(progn
Expand Down
58 changes: 53 additions & 5 deletions docs/cl-waffe2-docs/docs/base-impl-nodes.md
Original file line number Diff line number Diff line change
Expand Up @@ -1616,8 +1616,7 @@ Where-Operation-Node is a node which set `true-then`, if the result of calling `
✅ Already defined.

```lisp
((self dout da do) (declare (ignore dout da do)) ;; todo: :no-grad t
(values nil nil))
((self dout da do) (declare (ignore dout da do)) (values nil nil))
```

No need to implement backwards at `define-impl`. (they'd be ignored.)
Expand Down Expand Up @@ -1647,8 +1646,57 @@ Compare-Operation-Node is a node which set `true-then`, if the result of calling
✅ Already defined.

```lisp
((self dout da db do) (declare (ignore dout da db do)) ;; todo: :no-grad t
(values nil nil nil))
((self dout da db do) (declare (ignore dout da db do)) (values nil nil nil))
```

No need to implement backwards at `define-impl`. (they'd be ignored.)
## [node] IM2COLNODE

```
(X[N C H W] COL[N C K-H K-W H-OUT W-OUT] -> COL[N C K-H K-W H-OUT W-OUT])
```

No need to implement backwards at `define-impl`. (they'd be ignored.)
### Description

Im2ColNode is `AbstractNode` which implements forward propagation of [nn.Unfold](https://pytorch.org/docs/stable/generated/torch.nn.Unfold.html).

The node is only executed through the `cl-waffe2/nn:unfold` function, so arguments for constructors are dispatched automatically. In addition, the tensor `X` it receive will be the one after padding has been performed.

`N` indicates the number of batch-size, `C` is a channel-size. `k-h`, `k-w` represents the size of kernel, height and width respectively. `h-out` `w-out` is the size of output. `stride-x` `stride-y` is the number of stride, for the most case, specified by the stride argument in `Pooling2D` or `Conv2D`. `img-out` is AbstractTensor with the shape of `(N C H-in W-in)`, can be read by `img-out-of`. All symbols are exported from `cl-waffe2/base-impl` package.

In order to implement device-specific implementation of `Unfold`, define-impl `Im2ColNode` and `Col2ImNode`.


### Backward

✅ Already defined.

```lisp
((self dout x col) (declare (ignore x col))
(with-slots ((n n) (c c) (k-h k-h) (k-w k-w) (h-out h-out) (w-out w-out)
(stride-x stride-x) (stride-y stride-y))
self
(values
(call
(col2imnode n c (h-of self) (w-of self) k-h k-w h-out w-out stride-x
stride-y (img-out-of self))
dout)
nil)))
```

No need to implement backwards at `define-impl`. (they'd be ignored.)
## [node] COL2IMNODE

```
(COL[N C K-H K-W H-OUT W-OUT] -> X[N C H W])
```

### Description

Col2ImNode is `AbstractNode` which implements backward propagation of [nn.Unfold](https://pytorch.org/docs/stable/generated/torch.nn.Unfold.html).

See also: `Im2ColNode` documentation for argument descriptions.

### Backward

❌ Undefined. (To make it differentiable, must be defined with `define-impl` macro.)
10 changes: 5 additions & 5 deletions docs/cl-waffe2-docs/docs/base-impl.md
Original file line number Diff line number Diff line change
Expand Up @@ -1693,7 +1693,7 @@ The function a>scal sets `true-then` if the equation: `element > scal` is t, oth
`A` AbstractTensor
`scal` number (not a ScalarTensor)

(TODO: ScalarTensor as scal)
(TODO: ScalarTensor as a `scal` argument)
## [function] a<scal

```
Expand All @@ -1707,7 +1707,7 @@ The function a<scal sets `true-then` if the equation: `element < scal` is t, oth
`A` AbstractTensor
`scal` number (not a ScalarTensor)

(TODO: ScalarTensor as scal)
(TODO: ScalarTensor as a `scal` argument)
## [function] a>=scal

```
Expand All @@ -1721,7 +1721,7 @@ The function a>=scal sets `true-then` if the equation: `element >= scal` is t, o
`A` AbstractTensor
`scal` number (not a ScalarTensor)

(TODO: ScalarTensor as scal)
(TODO: ScalarTensor as a `scal` argument)
## [function] a<=scal

```
Expand All @@ -1735,7 +1735,7 @@ The function a<=scal sets `true-then` if the equation: `element <= scal` is t, o
`A` AbstractTensor
`scal` number (not a ScalarTensor)

(TODO: ScalarTensor as scal)
(TODO: ScalarTensor as a `scal` argument)
## [function] a=scal

```
Expand All @@ -1749,7 +1749,7 @@ The function a=scal sets `true-then` if the equation: `element = scal` is t, oth
`A` AbstractTensor
`scal` number (not a ScalarTensor)

(TODO: ScalarTensor as scal)
(TODO: ScalarTensor as a `scal` argument)
## [function] a>b

```
Expand Down
5 changes: 4 additions & 1 deletion docs/cl-waffe2-docs/docs/cpu-tensor-backend.md
Original file line number Diff line number Diff line change
@@ -1,5 +1,8 @@

# [package] :cl-waffe2/backends.cpu
The package `:cl-waffe2/backends.cpu` provides an AbstractTensor `CPUTensor` where most of its implementation relies on foreign libraries (e.g.: OpenBLAS, oneDNN in the coming future).
## [AbstractTensor] CPUTensor
## Enabling the SIMD Extension

For some instructions (e.g.: `!max` `!min`, sparse matrix supports, `SLEEF`, etc...), packages that provide SIMD-enabled CPUTensor implementations are not enabled by default as a design. To enable it, run `make build_simd_extension` in the same directory as cl-waffe2.asd. You can check that it is loaded properly with the `(show-backends)` function.

## [AbstractTensor] CPUTensor
Loading