-
Notifications
You must be signed in to change notification settings - Fork 4
Add integration example #461
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Draft
KanaiYuma-aist
wants to merge
57
commits into
main
Choose a base branch
from
feature/integration_example
base: main
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Draft
Changes from 21 commits
Commits
Show all changes
57 commits
Select commit
Hold shift + click to select a range
4e053ac
Add integration example(WIP).
KanaiYuma-aist 1b2986c
Fix config.
KanaiYuma-aist 2857756
Fix filename for mypy.
KanaiYuma-aist dbe17c7
Fix commnets.
KanaiYuma-aist 1b74aac
Fix typo.
KanaiYuma-aist 58ac660
Merge branch 'main' into feature/integration_example
KanaiYuma-aist 1885e48
Merge branch 'main' of github:aistairc/aiaccel into feature/integrati…
KanaiYuma-aist 52dea6e
Merge branch 'main' into feature/integration_example
KanaiYuma-aist dbddbf5
Add README.
KanaiYuma-aist cedc08e
Merge branch 'feature/integration_example' of github:aistairc/aiaccel…
KanaiYuma-aist 3c23b5e
Fix typo.
KanaiYuma-aist bbc4d84
Merge branch 'main' of github:aistairc/aiaccel into feature/integrati…
KanaiYuma-aist 4e129f2
Add Detailed Descriptions.
KanaiYuma-aist 0b1d447
Remove tensorboard in README.
KanaiYuma-aist 5bf3f72
Merge branch 'main' into feature/integration_example
KanaiYuma-aist 672fb3b
Add SaveValLossCallback.
KanaiYuma-aist 8c8339f
Remove unnecessary files.
KanaiYuma-aist f8db321
Fix README.
KanaiYuma-aist 0327954
Merge branch 'main' into feature/integration_example
KanaiYuma-aist 3ee480e
Remove comment.
KanaiYuma-aist fd99f49
Merge branch 'feature/integration_example' of github:aistairc/aiaccel…
KanaiYuma-aist 233ab80
Merge branch 'main' into feature/integration_example
KanaiYuma-aist 4086167
Merge branch 'main' into feature/integration_example
KanaiYuma-aist 618dbcf
Merge branch 'main' of github:aistairc/aiaccel into feature/integrati…
KanaiYuma-aist 2230afe
Add pyproject.toml.
KanaiYuma-aist e6daf34
Rename hpo_config.yaml.
KanaiYuma-aist 4fadd61
Merge branch 'main' of github:aistairc/aiaccel into feature/integrati…
KanaiYuma-aist 1b88776
Add job_config.yaml.
KanaiYuma-aist b609120
Use SaveMetricCallback.
KanaiYuma-aist 5925a69
Remove PYTHONPATH.
KanaiYuma-aist ff8a29d
Update job_config.yaml.
KanaiYuma-aist cc840b6
Merge branch 'main' of github:aistairc/aiaccel into feature/integrati…
KanaiYuma-aist da071b2
Fix job_config.
KanaiYuma-aist 4bf00e3
Fix README.
KanaiYuma-aist 5c6a070
Merge and delete common_config.
KanaiYuma-aist b67a378
Change to local.
KanaiYuma-aist 4fb787d
Remove unnecessary contents.
KanaiYuma-aist 872681f
Fix README.
KanaiYuma-aist bb789f4
Fix script_prologue.
KanaiYuma-aist bb3a226
Rename task_for_integration_example.
KanaiYuma-aist 600e3c9
Merge branch 'main' into feature/integration_example
yoshipon 5369d97
Update structure
yoshipon 5f75eed
Update test
yoshipon 7052b56
Merge remote-tracking branch 'origin/main' into feature/integration_e…
yoshipon 5a0d0a4
Update CI
yoshipon ec2563d
fix mypy
yoshipon 9fe7231
fix mypy
yoshipon 8a8b920
Update pre-commit
yoshipon a5b4adb
pre-commit migrate-config
yoshipon 63ad8f9
bugfix
yoshipon 4fe6fd5
pre-commit autoupdate
yoshipon d05623e
Update
yoshipon 5a3fb68
Merge branch 'main' into feature/integration_example
yoshipon b40eccd
Merge branch 'main' into feature/integration_example
yoshipon bd1049d
updates
yoshipon 6041ae0
Merge branch 'main' into feature/integration_example
yoshipon 22eabfd
Merge branch 'main' into feature/integration_example
yoshipon File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,26 @@ | ||
| # Example of Black Box Optimization on ABCI 3.0 | ||
|
|
||
| This is an example of performing black-box optimization of the learning rate for a ResNet50 model on the MNIST dataset. | ||
|
|
||
| ## Getting started | ||
|
|
||
| In an environment where aiaccel is installed, additionally install torchvision. | ||
|
|
||
| ```bash | ||
| pip install torchvision | ||
| ``` | ||
|
|
||
|
|
||
| Run the following command to perform black-box optimization. | ||
| PATH_TO_ENV should be changed to the path of the environment prepared above. | ||
|
|
||
| ```bash | ||
| python -m aiaccel.hpo.apps.optimize "python -m aiaccel.jobs.cli.abci3 gpu --command_prefix 'cd \$PBS_O_WORKDIR && module load cuda/12.6/12.6.1 && module load python/3.13/3.13.2 && source PATH_TO_ENV/bin/activate &&' jobs/{job_name}.log -- python -m aiaccel.torch.apps.train resnet50/config.yaml task.optimizer_config.optimizer_generator.lr={lr} trainer.logger.name=lr_{lr} out_filename={out_filename}" --config config.yaml | ||
yoshipon marked this conversation as resolved.
Outdated
Show resolved
Hide resolved
|
||
| ``` | ||
|
|
||
| ## Detailed Descriptions | ||
|
|
||
| The target function for optimization using aiaccel.hpo.app.optimize is objective_integration.main. | ||
| Within objective_integration.main, aiaccel.torch.app.train is called, and the learning rate is returned. | ||
|
|
||
| Detailed descriptions of torch and optimize are available on the [aiaccel document(torch)](https://aistairc.github.io/aiaccel/user_guide/torch.html) [aiaccel document(optimize)](https://aistairc.github.io/aiaccel/user_guide/hpo.html) | ||
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,61 @@ | ||
| trainer: | ||
| max_epochs: 10 | ||
|
|
||
| callbacks: | ||
| - _target_: lightning.pytorch.callbacks.ModelCheckpoint | ||
| filename: "{epoch:04d}" | ||
| save_last: True | ||
| save_top_k: -1 | ||
| - _target_: torchvision_task_integration.SaveValLossCallback | ||
| output_path: ${out_filename} | ||
|
|
||
|
|
||
| datamodule: | ||
| _target_: aiaccel.torch.lightning.datamodules.single_datamodule.SingleDataModule | ||
|
|
||
| train_dataset_fn: | ||
| _partial_: true | ||
| _target_: torchvision.datasets.MNIST | ||
| train: True | ||
|
|
||
| val_dataset_fn: | ||
| _partial_: true | ||
| _target_: torchvision.datasets.MNIST | ||
| train: False | ||
|
|
||
| common_args: | ||
| root: "./dataset" | ||
| download: True | ||
| transform: ${transform} | ||
|
|
||
| batch_size: 128 | ||
| use_scatter: False | ||
|
|
||
|
|
||
| transform: | ||
| _target_: torchvision.transforms.Compose | ||
| transforms: | ||
| - _target_: torchvision.transforms.Resize | ||
| size: [256, 256] | ||
| - _target_: torchvision.transforms.Grayscale | ||
| num_output_channels: 3 | ||
| - _target_: torchvision.transforms.ToTensor | ||
| - _target_: torchvision.transforms.Normalize | ||
| mean: [0.5] | ||
| std: [0.5] | ||
|
|
||
| task: | ||
| _target_: torchvision_task_integration.ImageClassificationTask | ||
| num_classes: 10 | ||
|
|
||
| model: | ||
| _target_: torchvision.models.resnet50 | ||
| weights: | ||
| _target_: hydra.utils.get_object | ||
| path: torchvision.models.ResNet50_Weights.DEFAULT | ||
|
|
||
| optimizer_config: | ||
| _target_: aiaccel.torch.lightning.OptimizerConfig | ||
| optimizer_generator: | ||
| _partial_: True | ||
| _target_: torch.optim.Adam |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,12 @@ | ||
| params: | ||
| _convert_: partial | ||
| _target_: aiaccel.hpo.apps.optimize.HparamsManager | ||
| lr: | ||
| _target_: aiaccel.hpo.optuna.suggest_wrapper.SuggestFloat | ||
| name: lr | ||
| low: 1.e-6 | ||
| high: 1.e-2 | ||
| log: true | ||
|
|
||
| n_trials: 30 | ||
| n_max_jobs: 4 |
yoshipon marked this conversation as resolved.
Outdated
Show resolved
Hide resolved
|
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,3 @@ | ||
| _base_: | ||
| - ${base_config_path}/train_base.yaml | ||
| - ../common_config.yaml |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,75 @@ | ||
| import json | ||
|
|
||
| import torch | ||
| from torch import nn | ||
| from torch.nn import functional as func | ||
|
|
||
| import lightning.pytorch as pl | ||
|
|
||
| from torchmetrics.classification import MulticlassAccuracy | ||
|
|
||
| from aiaccel.torch.lightning import OptimizerConfig, OptimizerLightningModule | ||
|
|
||
|
|
||
| class ImageClassificationTask(OptimizerLightningModule): | ||
| def __init__(self, model: nn.Module, optimizer_config: OptimizerConfig, num_classes: int = 10): | ||
| super().__init__(optimizer_config) | ||
|
|
||
| self.model = model | ||
| if hasattr(self.model.fc, "in_features") and isinstance(self.model.fc.in_features, int): | ||
| self.model.fc = nn.Linear(self.model.fc.in_features, num_classes) | ||
|
|
||
| self.train_accuracy = MulticlassAccuracy(num_classes=num_classes) | ||
| self.val_accuracy = MulticlassAccuracy(num_classes=num_classes) | ||
|
|
||
| def forward(self, x: torch.Tensor) -> torch.Tensor: | ||
| return self.model(x) # type: ignore | ||
|
|
||
| def training_step(self, batch: tuple[torch.Tensor, torch.Tensor], batch_idx: int) -> torch.Tensor: | ||
| x, y = batch | ||
|
|
||
| logits = self(x) | ||
|
|
||
| loss = func.cross_entropy(logits, y) | ||
|
|
||
| acc = self.train_accuracy(logits, y) | ||
| self.log_dict( | ||
| { | ||
| "training/loss": loss, | ||
| "training/acc": acc, | ||
| }, | ||
| prog_bar=True, | ||
| ) | ||
|
|
||
| return loss | ||
|
|
||
| def validation_step(self, batch: tuple[torch.Tensor, torch.Tensor], batch_idx: int) -> None: | ||
| x, y = batch | ||
|
|
||
| logits = self(x) | ||
|
|
||
| loss = func.cross_entropy(logits, y) | ||
|
|
||
| acc = self.val_accuracy(logits, y) | ||
| self.log_dict( | ||
| { | ||
| "validation/loss": loss, | ||
| "validation/acc": acc, | ||
| }, | ||
| prog_bar=True, | ||
| ) | ||
|
|
||
|
|
||
| class SaveValLossCallback(pl.Callback): | ||
| def __init__(self, output_path: str) -> None: | ||
| super().__init__() | ||
| self.output_path = output_path | ||
|
|
||
| def on_fit_end(self, trainer: pl.Trainer, pl_module: pl.LightningModule) -> None: | ||
| val_loss = trainer.callback_metrics.get("validation/loss") | ||
| if val_loss is not None: | ||
| val_loss_value = val_loss.item() | ||
| with open(self.output_path, "w") as f: | ||
| json.dump(val_loss_value, f) | ||
| else: | ||
| print("Warning: 'validation/loss' not found in callback_metrics.") | ||
yoshipon marked this conversation as resolved.
Outdated
Show resolved
Hide resolved
|
||
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Uh oh!
There was an error while loading. Please reload this page.