added multisectionhead by Adesoji1 · Pull Request #24 · tekaratzas/RustGPT

Adesoji1 · 2025-10-14T16:30:49Z

Do test well before you approve, reject PR if any error.

In src/self_attention.rs, i completely refactored it to implement multi-head attention as you said earlier

Also i Added num_heads and head_dim fields to track attention heads,
Added output projection matrix w_o for combining head outputs,
Implemented head splitting and concatenation logic,
Updated forward and backward passes to handle multiple attention heads,
So in src/transformer.rs, i updated the constructor to accept num_heads parameter

so that TransformerBlock::new() now takes (embedding_dim, hidden_dim, num_heads).
In src/main.rs: i updated it to use multi-head attention with 8 heads as default

The Default configuration is 8 attention heads
Furthermpre, in src/llm.rs: i Updated default implementation to use 8 heads

The Tests too were updated, so in all test files , i use files to use the new constructor signature for tests/self_attention_test.rs, tests/transformer_test.rs and tests/llm_test.rs.

adesoji@adesoji-Lenovo-Legion-7-15IMH05:~/Documents/RustGPT$ cargo test 2>&1 | tail -40
     Running tests/output_projection_test.rs (target/debug/deps/output_projection_test-4e9d08b337cbd89d)

running 5 tests
test test_output_projection_creation ... ok
test test_output_projection_forward ... ok
test test_output_projection_with_different_sequence_lengths ... ok
test test_output_projection_backward ... ok
test test_output_projection_training ... ok

test result: ok. 5 passed; 0 failed; 0 ignored; 0 measured; 0 filtered out; finished in 0.00s

     Running tests/self_attention_test.rs (target/debug/deps/self_attention_test-8f28365a0525528f)

running 2 tests
test test_multi_head_attention_forward ... ok
test test_multi_head_attention_with_different_sequence_lengths ... ok

test result: ok. 2 passed; 0 failed; 0 ignored; 0 measured; 0 filtered out; finished in 0.03s

     Running tests/transformer_test.rs (target/debug/deps/transformer_test-36edac1f2ee51240)

running 1 test
test test_transformer_block ... ok

test result: ok. 1 passed; 0 failed; 0 ignored; 0 measured; 0 filtered out; finished in 0.06s

     Running tests/vocab_test.rs (target/debug/deps/vocab_test-791d2688aaccfe09)

running 2 tests
test test_vocab_default ... ok
test test_vocab_encode_decode ... ok

test result: ok. 2 passed; 0 failed; 0 ignored; 0 measured; 0 filtered out; finished in 0.00s

   Doc-tests llm

running 0 tests

test result: ok. 0 passed; 0 failed; 0 ignored; 0 measured; 0 filtered out; finished in 0.00s

adesoji@adesoji-Lenovo-Legion-7-15IMH05:~/Documents/RustGPT$ cargo test self_attention 2>&1
   Compiling llm v0.1.0 (/home/adesoji/Documents/RustGPT)
    Finished `test` profile [unoptimized + debuginfo] target(s) in 0.25s
     Running unittests src/lib.rs (target/debug/deps/llm-132257558c9799fd)

running 0 tests

test result: ok. 0 passed; 0 failed; 0 ignored; 0 measured; 0 filtered out; finished in 0.00s

     Running unittests src/main.rs (target/debug/deps/llm-0d36e5d4e517be12)

running 0 tests

test result: ok. 0 passed; 0 failed; 0 ignored; 0 measured; 0 filtered out; finished in 0.00s

     Running tests/adam_test.rs (target/debug/deps/adam_test-f057b38856edfa4f)

running 0 tests

test result: ok. 0 passed; 0 failed; 0 ignored; 0 measured; 5 filtered out; finished in 0.00s

     Running tests/dataset_loader_test.rs (target/debug/deps/dataset_loader_test-7322943525e74782)

running 0 tests

test result: ok. 0 passed; 0 failed; 0 ignored; 0 measured; 2 filtered out; finished in 0.00s

     Running tests/embeddings_test.rs (target/debug/deps/embeddings_test-fedc1afad78da023)

running 0 tests

test result: ok. 0 passed; 0 failed; 0 ignored; 0 measured; 5 filtered out; finished in 0.00s

     Running tests/feed_forward_test.rs (target/debug/deps/feed_forward_test-84b429fcfbfb2096)

running 0 tests

test result: ok. 0 passed; 0 failed; 0 ignored; 0 measured; 3 filtered out; finished in 0.00s

     Running tests/llm_test.rs (target/debug/deps/llm_test-12f94139712e0f73)

running 0 tests

test result: ok. 0 passed; 0 failed; 0 ignored; 0 measured; 5 filtered out; finished in 0.00s

     Running tests/output_projection_test.rs (target/debug/deps/output_projection_test-4e9d08b337cbd89d)

running 0 tests

test result: ok. 0 passed; 0 failed; 0 ignored; 0 measured; 5 filtered out; finished in 0.00s

     Running tests/self_attention_test.rs (target/debug/deps/self_attention_test-8f28365a0525528f)

running 0 tests

test result: ok. 0 passed; 0 failed; 0 ignored; 0 measured; 4 filtered out; finished in 0.00s

     Running tests/transformer_test.rs (target/debug/deps/transformer_test-36edac1f2ee51240)

running 0 tests

test result: ok. 0 passed; 0 failed; 0 ignored; 0 measured; 1 filtered out; finished in 0.00s

     Running tests/vocab_test.rs (target/debug/deps/vocab_test-791d2688aaccfe09)

running 0 tests

test result: ok. 0 passed; 0 failed; 0 ignored; 0 measured; 2 filtered out; finished in 0.00s

adesoji@adesoji-Lenovo-Legion-7-15IMH05:~/Documents/RustGPT$ cargo test --test self_attention_test 2>&1
    Finished `test` profile [unoptimized + debuginfo] target(s) in 0.03s
     Running tests/self_attention_test.rs (target/debug/deps/self_attention_test-8f28365a0525528f)

running 4 tests
test test_multi_head_attention_forward ... ok
test test_multi_head_attention_with_different_sequence_lengths ... ok
test test_multi_head_attention_backward ... ok
test test_multi_head_attention_different_head_counts ... ok

test result: ok. 4 passed; 0 failed; 0 ignored; 0 measured; 0 filtered out; finished in 0.23s

adesoji@adesoji-Lenovo-Legion-7-15IMH05:~/Documents/RustGPT$ cargo test 2>&1 | grep -E "(test result:|running)"
running 0 tests
test result: ok. 0 passed; 0 failed; 0 ignored; 0 measured; 0 filtered out; finished in 0.00s
running 0 tests
test result: ok. 0 passed; 0 failed; 0 ignored; 0 measured; 0 filtered out; finished in 0.00s
running 5 tests
test result: ok. 5 passed; 0 failed; 0 ignored; 0 measured; 0 filtered out; finished in 0.00s
running 2 tests
test result: ok. 2 passed; 0 failed; 0 ignored; 0 measured; 0 filtered out; finished in 0.00s
running 5 tests
test result: ok. 5 passed; 0 failed; 0 ignored; 0 measured; 0 filtered out; finished in 0.01s
running 3 tests
test result: ok. 3 passed; 0 failed; 0 ignored; 0 measured; 0 filtered out; finished in 0.05s
running 5 tests
test result: ok. 5 passed; 0 failed; 0 ignored; 0 measured; 0 filtered out; finished in 0.08s
running 5 tests
test result: ok. 5 passed; 0 failed; 0 ignored; 0 measured; 0 filtered out; finished in 0.00s
running 4 tests
test result: ok. 4 passed; 0 failed; 0 ignored; 0 measured; 0 filtered out; finished in 0.27s
running 1 test
test result: ok. 1 passed; 0 failed; 0 ignored; 0 measured; 0 filtered out; finished in 0.06s
running 2 tests
test result: ok. 2 passed; 0 failed; 0 ignored; 0 measured; 0 filtered out; finished in 0.00s
running 0 tests
test result: ok. 0 passed; 0 failed; 0 ignored; 0 measured; 0 filtered out; finished in 0.00s
adesoji@adesoji-Lenovo-Legion-7-15IMH05:~/Documents/RustGPT$ cargo build --release 2>&1 | tail -5
   Compiling rand v0.9.2
   Compiling rand_distr v0.5.1
   Compiling csv v1.3.1
   Compiling llm v0.1.0 (/home/adesoji/Documents/RustGPT)
    Finished `release` profile [optimized] target(s) in 11.35s

tekaratzas · 2025-10-21T22:46:28Z

src/llm.rs

 impl Default for LLM {
    fn default() -> Self {
-        let transformer_block = TransformerBlock::new(EMBEDDING_DIM, HIDDEN_DIM);
+        let num_heads = 8; // Default to 8 attention heads


I feel this should be with with the rest of constants ex; MAX_SEQ_LEN and such

tekaratzas · 2025-10-21T22:46:45Z

src/main.rs

-    let transformer_block_2 = TransformerBlock::new(EMBEDDING_DIM, HIDDEN_DIM);
-    let transformer_block_3 = TransformerBlock::new(EMBEDDING_DIM, HIDDEN_DIM);
+     // Using 8 attention heads (EMBEDDING_DIM=128 / 8 = 16 dim per head)
+    let num_heads = 8;


We should share this with a universal const

tekaratzas · 2025-10-21T22:47:30Z

src/self_attention.rs

 impl Default for SelfAttention {
    fn default() -> Self {
-        SelfAttention::new(EMBEDDING_DIM)
+        SelfAttention::new(EMBEDDING_DIM, 8) // 8 attention heads by default


tekaratzas · 2025-10-21T22:47:57Z

src/self_attention.rs

+            w_o: Array2::from_shape_fn((embedding_dim, embedding_dim), |_| normal.sample(&mut rng)),
            cached_input: None,
+            cached_q: None,
+            cached_k: None,


And caching!! Very cool

tekaratzas · 2025-10-21T22:48:29Z

tests/self_attention_test.rs

-fn test_self_attention_forward() {
-    // Create self-attention module
-    let mut self_attention = SelfAttention::new(EMBEDDING_DIM);
+// #[test]


Let's get rid of this commented section. either uncomment or delete

tekaratzas · 2025-10-21T22:49:12Z

Hey! Thanks for the PR!

Gave it a quick glance. Will take another closer pass once I get some energy.

Also got some merge conflicts

Adesoji1 · 2025-10-22T05:18:11Z

Alright,I will work on the modifications

…

On Tue, 21 Oct 2025, 11:49 pm Thomas Karatzas, ***@***.***> wrote: *tekaratzas* left a comment (tekaratzas/RustGPT#24) <#24 (comment)> Hey! Thanks for the PR! Gave it a quick glance. Will take another closer pass once I get some energy. Also got some merge conflicts — Reply to this email directly, view it on GitHub <#24 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AE4COXCGNBIMULRNYFY2ROT3Y2Z73AVCNFSM6AAAAACJFIK2PWVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZTIMRZHAZTSNJWGA> . You are receiving this because you authored the thread.Message ID: ***@***.***>

added multisectionhead

851aee4

tekaratzas reviewed Oct 21, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

added multisectionhead#24

added multisectionhead#24
Adesoji1 wants to merge 1 commit intotekaratzas:mainfrom
Adesoji1:main

Adesoji1 commented Oct 14, 2025

Uh oh!

tekaratzas Oct 21, 2025

Uh oh!

tekaratzas Oct 21, 2025

Uh oh!

tekaratzas Oct 21, 2025

Uh oh!

tekaratzas Oct 21, 2025

Uh oh!

tekaratzas Oct 21, 2025

Uh oh!

tekaratzas commented Oct 21, 2025

Uh oh!

Adesoji1 commented Oct 22, 2025 via email

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

Adesoji1 commented Oct 14, 2025

Uh oh!

tekaratzas Oct 21, 2025

Choose a reason for hiding this comment

Uh oh!

tekaratzas Oct 21, 2025

Choose a reason for hiding this comment

Uh oh!

tekaratzas Oct 21, 2025

Choose a reason for hiding this comment

Uh oh!

tekaratzas Oct 21, 2025

Choose a reason for hiding this comment

Uh oh!

tekaratzas Oct 21, 2025

Choose a reason for hiding this comment

Uh oh!

tekaratzas commented Oct 21, 2025

Uh oh!

Adesoji1 commented Oct 22, 2025 via email

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants