New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

Sign up for GitHub

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Jump to bottom

Explicit convmixer cifar10 #374

Open

mcabbott wants to merge 7 commits into master from explicit_convmixer_cifar10

Member

mcabbott commented Dec 9, 2022 •

edited

Loading

This previously didn't have a manifest.

~~Needs a correction to the translation rule for Optimisers.ClipNorm in Flux.~~ fixed in FluxML/Flux.jl#2145

Also, a simple test showed increasing loss, so perhaps something else is broken?

Member

ToucheSir commented Dec 10, 2022

The original README does have some notes about achievable accuracy, but I don't believe anyone verified those numbers. One thing that could help is using the built-in ConvMixer model in Metalhead, because that has undergone basic convergence testing.

mcabbott added the update label

mcabbott mentioned this pull request

Fix two bugs re setup FluxML/Flux.jl#2145

Merged

mcabbott force-pushed the explicit_convmixer_cifar10 branch 2 times, most recently from 1333b75 to 7f13463 Compare

February 5, 2023 15:39

mcabbott commented

View reviewed changes

vision/convmixer_cifar10/convmixer.jl

Comment on lines -54 to -56

		function create_loss_function(dataloader, device)

		function loss(model)

Member Author

mcabbott Feb 5, 2023

Functions which create functions seem like an anti-pattern.

vision/convmixer_cifar10/convmixer.jl

Comment on lines -93 to -96

-                  if use_cuda
-                      device = gpu
-                      @info "Training on GPU"
-                  else

Member Author

mcabbott Feb 5, 2023

This @info will lie to you, unless you edit the file to change use_cuda... simpler just to call gpu and it will do nothing if you don't have one.

vision/convmixer_cifar10/convmixer.jl

-                          BatchNorm(dim),
-                          [
-                              Chain(
-                                  SkipConnection(Chain(Conv((kernel_size,kernel_size), dim=>dim, gelu; pad=SamePad(), groups=dim), BatchNorm(dim)), +),

Member Author

mcabbott Feb 5, 2023

No functional change to the model, just indents etc.

vision/convmixer_cifar10/convmixer.jl Outdated

Comment on lines 97 to 104

+                  opt = OptimiserChain(
                           WeightDecay(1f-3),
-                          ClipNorm(1.0),
-                          ADAM(η)
+                          ClipNorm(1f0),
+                          Adam(η),
                           )

Member Author

mcabbott Feb 5, 2023

This exposes a real error: composing ClipNorm after another optimiser means it tries to take norm(::Broadcasted) which fails:

julia> train(epochs=10, images=128)
ERROR: Scalar indexing is disallowed.
...
Stacktrace:
...
 [12] norm(itr::Base.Broadcast.Broadcasted{CUDA.CuArrayStyle{4}, NTuple{4, Base.OneTo{Int64}}, typeof(+), Tuple{CuArray{Float32, 4, CUDA.Mem.DeviceBuffer}, Base.Broadcast.Broadcasted{CUDA.CuArrayStyle{4}, Nothing, typeof(*), Tuple{Float32, CuArray{Float32, 4, CUDA.Mem.DeviceBuffer}}}}}, p::Float32)
    @ LinearAlgebra ~/julia-9ded051e9f/share/julia/stdlib/v1.10/LinearAlgebra/src/generic.jl:596
 [13] apply!(o::Optimisers.ClipNorm{Float32}, state::Nothing, x::CuArray{Float32, 4, CUDA.Mem.DeviceBuffer}, dx::Base.Broadcast.Broadcasted{CUDA.CuArrayStyle{4}, NTuple{4, Base.OneTo{Int64}}, typeof(+), Tuple{CuArray{Float32, 4, CUDA.Mem.DeviceBuffer}, Base.Broadcast.Broadcasted{CUDA.CuArrayStyle{4}, Nothing, typeof(*), Tuple{Float32, CuArray{Float32, 4, CUDA.Mem.DeviceBuffer}}}}})
    @ Optimisers ~/.julia/packages/Optimisers/kPdJV/src/rules.jl:584

Member Author

mcabbott Feb 5, 2023

FluxML/Optimisers.jl#127

Member Author

mcabbott Jul 25, 2023

Should be fixed, but not yet tested here.

mcabbott force-pushed the explicit_convmixer_cifar10 branch from 7f13463 to 3a2580b Compare

July 25, 2023 14:10

mcabbott commented

View reviewed changes

vision/convmixer_cifar10/convmixer.jl Outdated

                   end
-                  model = model |> cpu
-                  @save "model.bson" model
+                  @save "model.bson" cpu(model)

Member Author

mcabbott Jul 25, 2023

Should really be updated to Flux.state too

mcabbott added 6 commits

March 20, 2024 11:11


          upgrade to explicit

c1c2e8a


          add project + manifest

6c6aea3


          tweaks

af94623


          update

d5cba02


          flux 0.14+

bc54245


          update

ca9b934

mcabbott force-pushed the explicit_convmixer_cifar10 branch from 3a2580b to ca9b934 Compare

March 20, 2024 16:06


          update

354e966

mcabbott marked this pull request as ready for review

March 20, 2024 22:17

mcabbott requested a review from CarloLucibello

March 21, 2024 02:25

CarloLucibello reviewed

View reviewed changes

vision/convmixer_cifar10/convmixer.jl

+                  opt = OptimiserChain(
+                          WeightDecay(1f-3),  # L2 regularisation
+                          ClipNorm(1f0),
+                          Adam(3f-4),  # learning rate

Member

CarloLucibello Mar 21, 2024

Current practice would be to use AdamW and set L2 regularization there

Member Author

mcabbott Mar 21, 2024

Yes, but maybe it's good to show how to chain rules?

vision/convmixer_cifar10/convmixer.jl

+                      for (x1, y1) in train_loader
+                          # move one batch at a time to GPU; gpu(train_loader) would be another way
+                          x, y = gpu(x1), gpu(y1)
+                          grads = gradient(m -> Flux.logitcrossentropy(m(x), y; agg=sum), model)

Member

CarloLucibello Mar 21, 2024

why agg=sum?

Member Author

mcabbott Mar 21, 2024

I just left this, as it probably doesn't hurt... no idea why it's there, maybe it was like that in what someone followed? Showing what the keyword is doesn't seem awful

vision/convmixer_cifar10/convmixer.jl

+                  cpu_model = cpu(model)
+                  BSON.@save "model.bson" cpu_model
+                  BSON.@save "losses.bson" train_save test_save
+                  # it's generally more robust to save just the state, like this, but

Member

CarloLucibello Mar 21, 2024

but what? I would remove entirely the model saving with bson in favor of state saving with JLD2

Member Author

mcabbott Mar 21, 2024

I reverted this because maybe it's not a terrible idea to show some variety of approaches, not repeat exactly the same thing in every model?

The way this model is set up, you'd have to copy-paste the code with the sizes etc. to re-build the model before you can load the state back in. So perhaps leaving this is OK?

vision/convmixer_cifar10/convmixer.jl

Comment on lines +122 to +123

		train_save[epoch, :] = [train_loss, train_acc]
		test_save[epoch, :] = [test_loss, test_acc]

Member

CarloLucibello Mar 21, 2024

Can we remove this history saving entirely? seems useless

Member Author

mcabbott Mar 21, 2024

It's not a crazy way to collect some numbers, maybe this pattern makes sense to someone instead of others?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment