Releases · Lightning-AI/pytorch-lightning

08 Dec 18:52

Borda

1.8.4

7eb5ff5

Weekly patch release

App

Added

Add code_dir argument to tracer run (#15771)
Added the CLI command lightning run model to launch a LightningLite accelerated script (#15506)
Added the CLI command lightning delete app to delete a lightning app on the cloud (#15783)
Added a CloudMultiProcessBackend which enables running a child App from within the Flow in the cloud (#15800)
Utility for pickling work object safely even from a child process (#15836)
Added AutoScaler component (#15769)
Added the property ready of the LightningFlow to inform when the Open App should be visible (#15921)
Added private work attributed _start_method to customize how to start the works (#15923)
Added a configure_layout method to the LightningWork which can be used to control how the work is handled in the layout of a parent flow (#15926)
Added the ability to run a Lightning App or Component directly from the Gallery using lightning run app organization/name (#15941)
Added automatic conversion of list and dict of works and flows to structures (#15961)

Changed

The MultiNode components now warn the user when running with num_nodes > 1 locally (#15806)
Cluster creation and deletion now waits by default [#15458
Running an app without a UI locally no longer opens the browser (#15875)
Show a message when BuildConfig(requirements=[...]) is passed but a requirements.txt file is already present in the Work (#15799)
Show a message when BuildConfig(dockerfile="...") is passed but a Dockerfile file is already present in the Work (#15799)
Dropped name column from cluster list (#15721)
Apps without UIs no longer activate the "Open App" button when running in the cloud (#15875)
Wait for full file to be transferred in Path / Payload (#15934)

Removed

Removed the SingleProcessRuntime (#15933)

Fixed

Fixed SSH CLI command listing stopped components (#15810)
Fixed bug when launching apps on multiple clusters (#15484)
Fixed Sigterm Handler causing thread lock which caused KeyboardInterrupt to hang (#15881)
Fixed MPS error for multinode component (defaults to cpu on mps devices now as distributed operations are not supported by pytorch on mps) (#15748)
Fixed the work not stopped when successful when passed directly to the LightningApp (#15801)
Fixed the PyTorch Inference locally on GPU (#15813)
Fixed the enable_spawn method of the WorkRunExecutor (#15812)
Fixed require/import decorator (#15849)
Fixed a bug where using L.app.structures would cause multiple apps to be opened and fail with an error in the cloud (#15911)
Fixed PythonServer generating noise on M1 (#15949)
Fixed multiprocessing breakpoint (#15950)
Fixed detection of a Lightning App running in debug mode (#15951)
Fixed ImportError on Multinode if package not present (#15963)

Lite

Fixed shuffle=False having no effect when using DDP/DistributedSampler (#15931)

Pytorch

Changed

Direct support for compiled models (#15922)

Fixed

Fixed issue with unsupported torch.inference_mode() on hpu backends (#15918)
Fixed LRScheduler import for PyTorch 2.0 (#15940)
Fixed fit_loop.restarting to be False for lr finder (#15620)
Fixed torch.jit.script-ing a LightningModule causing an unintended error message about deprecated use_amp property (#15947)

Full Changelog: 1.8.3...1.8.4

Assets 10

25 Nov 19:20

tchaton

1.8.3.post1

92fe188

Hotfix for Python Server

App

Changed

Fixed the PyTorch Inference locally on GPU (#15813)

Full Changelog: 1.8.3...1.8.3

Assets 10

23 Nov 15:03

Borda

1.8.3.post0

655ade6

Hotfix for requirements

Revert/s3fs (#15792)

* revert s3fs

* post

Assets 10

23 Nov 10:11

Borda

1.8.3

7d6cfb1

Weekly patch release

App

Changed

Deduplicate top-level lighting CLI command groups (#15761)
- lightning add ssh-key CLI command has been transitioned to lightning create ssh-key
- lightning remove ssh-key CLI command has been transitioned to lightning delete ssh-key
Set Torch inference mode for prediction (#15719)
Improved LightningTrainerScript start-up time (#15751)
Disable XSRF protection in StreamlitFrontend to support upload in localhost (#15684)

Fixed

Fixed debugging with VSCode IDE (#15747)
Fixed setting property to the LightningFlow (#15750)

Lite

Changed

Temporarily removed support for Hydra multi-run (#15737)

Pytorch

Changed

Temporarily removed support for Hydra multi-run (#15737)
Switch from tensorboard to tensorboardx in TensorBoardLogger (#15728)

Full Changelog: 1.8.2...1.8.3

Assets 10

18 Nov 00:44

Borda

1.8.2

8bea72b

Weekly patch release

App

Added

Added title and description to ServeGradio (#15639)
Added a friendly error message when attempting to run the default cloud compute with a custom base image configured (#14929)

Changed

Improved support for running apps when dependencies aren't installed (#15711)
Changed the root directory of the app (which gets uploaded) to be the folder containing the app file, rather than any parent folder containing a .lightning file (#15654)
Enabled MultiNode Components to support state broadcasting (#15607)
Prevent artefactual "running from outside your current environment" error (#15647)
Rename failed -> error in tables (#15608)

Fixed

Fixed race condition to over-write the frontend with app infos (#15398)
Fixed bi-directional queues sending delta with Drive Component name changes (#15642)
Fixed CloudRuntime works collection with structures and accelerated multi node startup time (#15650)
Fixed catimage import (#15712)
Parse all lines in app file looking for shebangs to run commands (#15714)

Lite

Fixed

Fixed the automatic fallback from LightningLite(strategy="ddp_spawn", ...) to LightningLite(strategy="ddp", ...) when on an LSF cluster (#15103)

Pytorch

Fixed

Make sure save_dir can be empty str (#15638](#15638))
Fixed the automatic fallback from Trainer(strategy="ddp_spawn", ...) to Trainer(strategy="ddp", ...) when on an LSF cluster (#15103](#15103))

Full Changelog: 1.8.1...1.8.2

Assets 10

10 Nov 20:25

Borda

1.8.1

18c587e

Weekly patch release

App

Added

Added the start method to the work (#15523)
Added a MultiNode Component to run with distributed computation with any frameworks (#15524)
Expose RunWorkExecutor to the work and provides default ones for the MultiNode Component (#15561)
Added a start_with_flow flag to the LightningWork which can be disabled to prevent the work from starting at the same time as the flow (#15591)
Added support for running Lightning App with VSCode IDE debugger (#15590)
Added bi-directional delta updates between the flow and the works (#15582)
Added --setup flag to lightning run app CLI command allowing for dependency installation via app comments (#15577)
Auto-upgrade / detect environment mis-match from the CLI (#15434)
Added Serve component (#15609)

Changed

Changed the flow.flows to be recursive wont to align the behavior with the flow.works (#15466)
The params argument in TracerPythonScript.run no longer prepends -- automatically to parameters (#15518)
Only check versions / env when not in the cloud (#15504)
Periodically sync database to the drive (#15441)
Slightly safer multi node (#15538)
Reuse existing commands when running connect more than once (#15471)

Fixed

Fixed writing app name and id in connect.txt file for the command CLI (#15443)
Fixed missing root flow among the flows of the app (#15531)
Fixed bug with Multi Node Component and add some examples (#15557)
Fixed a bug where payload would take a very long time locally (#15557)
Fixed an issue with the lightning CLI taking a long time to error out when the cloud is not reachable (#15412)

Lite

Fixed

Fix an issue with the SLURM srun detection causing permission errors (#15485)
Fixed the import of lightning_lite causing a warning 'Redirects are currently not supported in Windows or MacOs' (#15610)

PyTorch

Fixed

Fixed TensorBoardLogger not validating the input array type when logging the model graph (#15323)
Fixed an attribute error in ColossalAIStrategy at import time when torch.distributed is not available (#15535)
Fixed an issue when calling fs.listdir with file URI instead of path in CheckpointConnector (#15413)
Fixed an issue with the BaseFinetuning callback not setting the track_running_stats attribute for batch normaliztion layers (#15063)
Fixed an issue with WandbLogger(log_model=True|'all) raising an error and not being able to serialize tensors in the metadata (#15544)
Fixed the gradient unscaling logic when using Trainer(precision=16) and fused optimizers such as Adam(..., fused=True) (#15544)
Fixed model state transfer in multiprocessing launcher when running multi-node (#15567)
Fixed manual optimization raising AttributeError with Bagua Strategy (#12534)
Fixed the import of pytorch_lightning causing a warning 'Redirects are currently not supported in Windows or MacOs' (#15610)

Full Changelog: 1.8.0...1.8.1

Assets 10

02 Nov 16:31

Borda

1.8.0.post1

0edeb21

Minor pkg stability fix

What's Changed

Implement freeze batchnorm with freezing track running stats by @PososikTeam in #15063
Pkg: fix parsing versions by @Borda in #15401
Remove pytest as a requirement to run app by @manskx in #15449

New Contributors

@PososikTeam made their first contribution in #15063

Full Changelog: 1.8.0...1.8.0.post1

Contributors

Borda, manskx, and PososikTeam

Assets 10

01 Nov 11:13

awaelchli

1.8.0

7ee0994

Lightning 1.8: Colossal-AI Strategy, Commands and Secrets for Apps, FSDP Improvements and More!

The core team is excited to announce the release of Lightning 1.8 ⚡

Highlights
Backward Incompatible Changes
Deprecations
Full Changelog
Contributors

Lightning v1.8 is the culmination of work from 52 contributors who have worked on features, bug-fixes, and documentation for a total of over 550+ commits since v1.7.

Highlights

Colossal-AI

Colossal-AI focuses on improving efficiency when training large-scale AI models with billions of parameters. With the new Colossal-AI strategy in Lightning 1.8, you can train existing models like GPT-3 with up to half as many GPUs as usually needed. You can also train models up to twice as big with the same number of GPUs, saving you significant cost. Here is how you use it:

# Select the strategy with good defaults
trainer = Trainer(strategy="colossalai")

# or tune parameters to your liking
from lightning.pytorch.strategies import ColossalAIStrategy

trainer = Trainer(strategy=ColossalAIStrategy(placement_policy="cpu", ...))

You can find Colossal-AI's benchmarks with Lightning on GPT-2 here.

Under the hood, Colossal-AI implements different parallelism algorithms that are especially interesting for the development of SOTA transformer models:

Data Parallelism
Pipeline Parallelism
1D, 2D, 2.5D, 3D Tensor Parallelism
Sequence Parallelism
Zero Redundancy Optimization

Learn how to install and use Colossal-AI effectively with Lightning here.

NOTE: This strategy is marked as experimental. Stay tuned for more updates in the future.

Secrets for Lightning Apps

Introducing encrypted secrets (#14612), a feature requested by Lightning App users 🎉!

Encrypted secrets allow you to securely pass private data to your apps, like API keys, access tokens, database passwords, or other credentials, without exposing them in your code.

Add a secret to your Lightning account in lightning.ai (read more here)

Add an environment variable to your app to read the secret:

# somewhere in your Flow or Work:
GitHubComponent(api_token=os.environ["API_TOKEN"])

Pass the secret to your app run with the following command:

lightning run app app.py --cloud --secret API_TOKEN=github_api_token

These secrets are encrypted and stored in the Lightning database. Nothing except your app can access the value.

NOTE: This is an experimental feature.

CLI Commands for Lightning Apps

Introducing CLI commands for apps (#13602)!
As a Lightning App builder, if you want to easily create a CLI interface for users to interract with your app, then this is for you.

Here is an example where users can dynamically create notebooks from the CLI.
All you need to do is implement the configure_commands hook on the LightningFlow:

import lightning as L
from commands.notebook.run import RunNotebook


class Flow(L.LightningFlow):
    ...

    def configure_commands(self):
        # Return a list of dictionaries with commands:
        return [{"run notebook": RunNotebook(method=self.run_notebook)}]


app = L.LightningApp(Flow())

Once the app is running with lightning run app app.py, you can connect to the app with the following command:

lightning connect {app name} -y

and run the command that was configured:

lightning run notebook --name=my_notebook_name

For a full tutorial and running example, visit our docs. TODO: add to docs
NOTE: This is an experimental feature.

Auto-wrapping for FSDP Strategy

In Lightning v1.7, we introduced an integration for PyTorch FSDP in the form of our FSDP strategy, which allows you to train huge models with billions of parameters sharded across hundreds of GPUs and machines.

# Native FSDP implementation
trainer = Trainer(strategy="fsdp_native")

We are continuing to improve the support for this feature by adding automatic wrapping of layers for use cases where the model fits into CPU memory, but not into GPU memory (#14383).

Here are some examples:

Case 1: Model is so large that it does not fit into CPU memory.
Construct your layers in the configure_sharded_model hook and wrap the large ones you want to shard across GPUs:

class MassiveModel(LightningModule):
    ...
    
    # Create model here and wrap the large layers for sharding
    def configure_sharded_model(self):
        for i, layer in enumerate(self.block):
            self.block[i] = wrap(layer)
        ...

Case 2: Model fits into CPU memory, but not into GPU memory. In Lightning v1.8, you no longer need to do anything special here, as we can automatically wrap the layers for you using FSDP's policy:

model = MassiveModel()
trainer = Trainer(
    accelerator="gpu", 
    devices=8, 
    strategy="fsdp_native",  # or strategy="fsdp" for fairscale
    precision=16
)

# Automatically wraps the layers here:
trainer.fit(model)

Case 3: Model fits into GPU memory. No action required, use any strategy you want.

Note: if you want to manually wrap layers for more control, you can still do that!

Read more about FSDP and how layer wrapping works in our docs.

New Tuner Callbacks

In this release, we focused on Tuner improvements and introduced two new callbacks that can help you customize the batch size finder and learning rate finder as per your use case.

Batch Size Finder (#11089)

You can customize the BatchSizeFinder callback to run at different epochs. This feature is useful while fine-tuning models since you can't always use the same batch size after unfreezing the backbone.

from lightning.pytorch.callbacks import BatchSizeFinder


class FineTuneBatchSizeFinder(BatchSizeFinder):
    def __init__(self, milestones, *args, **kwargs):
        super().__init__(*args, **kwargs)
        self.milestones = milestones

    def on_fit_start(self, *args, **kwargs):
        return

    def on_train_epoch_start(self, trainer, pl_module):
        if trainer.current_epoch in self.milestones or trainer.current_epoch == 0:
            self.scale_batch_size(trainer, pl_module)


trainer = Trainer(callbacks=[FineTuneBatchSizeFinder(milestones=(5, 10))])
trainer.fit(...)

Run batch size finder for validate/test/predict.

from lightning.pytorch.callbacks import BatchSizeFinder


class EvalBatchSizeFinder(BatchSizeFinder):
    def __init__(self, *args, **kwargs):
        super().__init__(*args, **kwargs)

    def on_fit_start(self, *args, **kwargs):
        return

    def on_test_start(self, trainer, pl_module):
        self.scale_batch_size(trainer, pl_module)


trainer = Trainer(callbacks=[EvalBatchSizeFinder()])
trainer.test(...)

Learning Rate Finder (#13802)

You can now use the LearningRateFinder callback to run at different intervals. This feature is useful when fine-tuning models, for example.

from lightning.pytorch.callbacks import LearningRateFinder


class FineTuneLearningRateFinder(LearningRateFinder):
    def __init__(self, milestones, *args, **kwargs):
        super().__init__(*args, **kwargs)
        self.milestones = milestones

    def on_fit_start(self, *args, **kwargs):
        return

    def on_train_epoch_start(self, trainer, pl_module):
        if trainer.current_epoch in self.milestones or trainer.current_epoch == 0:
            self.lr_find(trainer, pl_module)

trainer = Trainer(callbacks=[FineTuneLearningRateFinder(milestones=(5, 10))])
trainer.fit(...)

LightningCLI Improvements

Even though the LightningCLI class is designed to help in the implementation of command line tools, there are instances when it might be more desirable to run directly from Python. In Lightning 1.8, you can now do this (#14596):

from lightning.pytorch.cli import LightningCLI

def cli_main(args):
    cli = LightningCLI(MyModel, ..., args=args)
    ...

Anywhere in your program, you can now call the CLI directly:

cli_main(["--trainer.max_epochs=100", "--model.encoder_layers=24"])

Learn about all features of the LightningCLI!

Improvements to the SLURM Support

Multi-node training on a SLURM cluster has been supported since the inception of Lightning Trainer, and has seen several improvements over time thanks to many community contributions. And we just keep going! In this release, we've added two quality of life improvements:

The preemption/termination signal is now configurable (#14626):

# the default signal is SIGUSR1
trainer = Trainer(plugins=[...

Contributors

nicolai86, daniel347x, and 86 other contributors

Assets 10

1 Join discussion

20 Oct 15:07

Borda

App/0.7.0

65d29f0

Apps's secrets & meta tags

[0.7.0] - 2022-10-20

Added

Add --secret option to CLI to allow binding Secrets to app environment variables when running in the cloud (#14612)
Added support for adding descriptions to commands either through a docstring or the DESCRIPTION attribute (#15193
Added option to add custom meta tags to the UI container (#14915)
Added support to pass a LightningWork to the LightningApp (#15215

Changed

Allowed root path to run the app on /path (#14972)

Assets 6

07 Oct 20:45

Borda

app/0.6.3

9417739

App with meta tags

[0.6.3] - 2022-10-07

Added

Added option to add custom meta tags to the UI container (#14915)

Changed

Allowed root path to run the app on /path (#14972)

Contributors

@pritamsoni-hsr

If we forgot someone due to not matching commit email with GitHub account, let us know :]

Contributors

pritamsoni-hsr

Assets 6

Releases: Lightning-AI/pytorch-lightning

Weekly patch release

App

Added

Changed

Removed

Fixed

Lite

Pytorch

Changed

Fixed

Hotfix for Python Server

App

Changed

Hotfix for requirements

Weekly patch release

App

Changed

Fixed

Lite

Changed

Pytorch

Changed

Weekly patch release

App

Added

Changed

Fixed

Lite

Fixed

Pytorch

Fixed

Weekly patch release

App

Added

Changed

Fixed

Lite

Fixed

PyTorch

Fixed

Minor pkg stability fix

What's Changed

New Contributors

Contributors

Lightning 1.8: Colossal-AI Strategy, Commands and Secrets for Apps, FSDP Improvements and More!

Highlights

Colossal-AI

Secrets for Lightning Apps

CLI Commands for Lightning Apps

Auto-wrapping for FSDP Strategy

New Tuner Callbacks

Batch Size Finder (#11089)

Learning Rate Finder (#13802)

LightningCLI Improvements

Improvements to the SLURM Support

Contributors

Apps's secrets & meta tags

[0.7.0] - 2022-10-20

Added

Changed

App with meta tags

[0.6.3] - 2022-10-07

Added

Changed

Contributors

Contributors