Work In Progress: Vision Transformers for classifier network #495

larissapoghosyan · 2025-03-16T20:35:18Z

Description

This Pull Request adds implementations of State-of-the-Art computer vision architecture to cellfinder, for classifier module, which supports only resnet now. The State-of-the-Art afaik for brain image cell classification is ViT, so I followed the Keras abstractions, and made sure it's absolutely compatible to the current codebase.

The plan for future steps:

Any guidance and feedback as I move forward would be highly appreciated.
@IgorTatarnikov @alessandrofelder @adamltyson How do you feel about this plan?

References

This PR is heavily inspired by:

Codebases with ViTs for brain

SoTA currently:
- CellVIT: https://github.com/TIO-IKIM/CellViT
  - paper
- CellVIT++: https://github.com/TIO-IKIM/CellViT-Plus-Plus
  - paper
- TransSeg: https://github.com/yuhui-zh15/TransSeg/tree/main/src/backbones/encoders
  - paper

Other

Implementation pointers in Keras

Vision Transformers (ViT) in Keras, but for 2d only:
- https://keras.io/examples/vision/image_classification_with_vision_transformer/
3D Vision Transformers in Keras, but for video:
- https://keras.io/examples/vision/vivit/

Related Issues and Feature Request

How has this PR been tested?

Tested only on sample tiny dataset - for full-scale testing I need an access to a machine with GPU. Please let me know if you can help me with this.

Is this a breaking change?

This is based on Keras abstractions, and is fully compatible with current classifiers.

Visualization (the smaller - 4-layer ViT)

BTW, the code used for visualization:

import keras
from cellfinder.core.classify.tools import get_model

model = get_model(network_depth="vit-4-layer")
keras.utils.plot_model(
    model,
    show_shapes=True,
    show_dtype=True,
    show_layer_names=True,
    expand_nested=True,
    dpi=50,
    show_layer_activations=True,
)

adamltyson · 2025-03-17T07:42:08Z

Hi @larissapoghosyan, this looks great! This is just what I was thinking of. I think you're on the right lines adding a new architecture, but keeping it as close to the resnet implementation as possible. There will be many users who have trained resnet models, who will want to stay with the old architecture for some time.

A few comments spring to mind:

The benchmarking is the most important thing, there's not much point adding a new architecture if it isn't significantly more accurate, quicker, or requires fewer resources (e.g. memory, training data)
At some point this model choice will need to be exposed in the Python API, the training CLI, the brainmapper CLI and the napari plugin
Optimize data loading for cell classification #493 is very likely to be merged, so it would be good to make sure this PR is compatible with that one

larissapoghosyan · 2025-03-18T23:04:44Z

Hey @adamltyson
Thank you for the detailed feedback - I'm glad to hear I'm on the right track.
You raise excellent points about this being just one part of the broader work needed.

I completely agree that comprehensive benchmarking is essential - I'll be conducting thorough accuracy measurements and speed testing. I'll also make sure to incorporate the suggested changes regarding external repository integration into the plan (will edit PR description accordingly).

For now, I'll keep this as a draft PR while I continue making progress. I'll wait for #493 to be merged before moving this PR from draft to ready for review status.

Looking forward to sharing the benchmarking results and further improvements.

adamltyson · 2025-04-10T09:29:57Z

Hi @larissapoghosyan, #493 is not likely to be merged for a while. Are you planning on continuing to work on this anytime soon? If not, we will close the PR (but feel free to reopen it anytime).

larissapoghosyan · 2025-04-14T23:05:09Z

Hi @adamltyson , I’m definitely planning to continue working on this (hopefully in the frames of GSoC), so we can close the PR for now. Your feedback was incredibly helpful, thank you!

larissapoghosyan added 2 commits March 16, 2025 18:51

Vision Transformers 3D classifiers

5d7e125

Pre-commit formatting

bee49eb

fix the model type detection

c066779

larissapoghosyan mentioned this pull request Mar 28, 2025

cellfinder: Exploring Newer Architectures for Classifier Network (Larissa Poghosyan) neuroinformatics-unit/gsoc#26

Merged

adamltyson closed this Apr 15, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Work In Progress: Vision Transformers for classifier network #495

Work In Progress: Vision Transformers for classifier network #495

Uh oh!

larissapoghosyan commented Mar 16, 2025 •

edited

Loading

Uh oh!

adamltyson commented Mar 17, 2025

Uh oh!

larissapoghosyan commented Mar 18, 2025

Uh oh!

adamltyson commented Apr 10, 2025

Uh oh!

larissapoghosyan commented Apr 14, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Work In Progress: Vision Transformers for classifier network #495

Work In Progress: Vision Transformers for classifier network #495

Uh oh!

Conversation

larissapoghosyan commented Mar 16, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

References

Codebases with ViTs for brain

Other

Implementation pointers in Keras

Related Issues and Feature Request

How has this PR been tested?

Is this a breaking change?

Visualization (the smaller - 4-layer ViT)

Uh oh!

adamltyson commented Mar 17, 2025

Uh oh!

larissapoghosyan commented Mar 18, 2025

Uh oh!

adamltyson commented Apr 10, 2025

Uh oh!

larissapoghosyan commented Apr 14, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

larissapoghosyan commented Mar 16, 2025 •

edited

Loading