Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Try to run tests with pytorch 2.3 #19859

Open
wants to merge 3 commits into
base: master
Choose a base branch
from

Conversation

hmaarrfk
Copy link
Contributor

Test for #19765

@codecov-commenter
Copy link

codecov-commenter commented Jun 16, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 73.40%. Comparing base (9cf4e94) to head (e74d9b3).
Report is 2 commits behind head on master.

Additional details and impacted files
@@            Coverage Diff             @@
##           master   #19859      +/-   ##
==========================================
- Coverage   78.95%   73.40%   -5.55%     
==========================================
  Files         498      498              
  Lines       46129    46129              
  Branches     8487     8487              
==========================================
- Hits        36419    33859    -2560     
- Misses       8001    10619    +2618     
+ Partials     1709     1651      -58     
Flag Coverage Δ
keras 73.32% <ø> (-5.49%) ⬇️
keras-jax 62.28% <ø> (ø)
keras-numpy 56.88% <ø> (ø)
keras-tensorflow 63.52% <ø> (ø)
keras-torch ?

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@hmaarrfk
Copy link
Contributor Author

Sorry, there seemed to be one more place where things are pinned down.

@hmaarrfk
Copy link
Contributor Author

ps. it seemed from what I can test that the build for the first commit I made was:

  • Built with pytorch 2.2.2
  • Tested with pytorch 2.3.1 (the tests passed)

@hmaarrfk
Copy link
Contributor Author

wow this is unexpected, need to see what is different about the test environment....

@hmaarrfk
Copy link
Contributor Author

Its installed from conda-forge, but the test fails seems to pass :/

$ pip list | grep -E "(torch|keras)"
keras                     3.3.3                     /home/mark/git/keras
tf_keras                  2.16.0
torch                     2.3.0.post301
keras/src/callbacks/swap_ema_weights_test.py::SwapEMAWeightsTest::test_swap_ema_weights PASSED        [100%]

@fchollet
Copy link
Member

Its installed from conda-forge, but the test fails seems to pass :/

You mean installing with pip fails but installing with conda works?

I can reproduce the issue on MacOS (with the pip package) btw.

@hmaarrfk
Copy link
Contributor Author

hmaarrfk commented Jun 16, 2024

You mean installing with pip fails but installing with conda works?

I'm not too sure, i'll have to try to recreate my environment from scratch to confirm.

I can reproduce the issue on MacOS (with the pip package) btw.

This is a good sign!

@hmaarrfk
Copy link
Contributor Author

hmaarrfk commented Jun 16, 2024

I just can't recreate on linux....

$ pytest keras/src/callbacks/swap_ema_weights_test.py::SwapEMAWeightsTest::test_swap_ema_weights
===================================== test session starts =====================================
platform linux -- Python 3.10.14, pytest-8.2.2, pluggy-1.5.0 -- /home/mark/miniforge3/envs/keras/bin/python3.10
cachedir: .pytest_cache
rootdir: /home/mark/git/keras
configfile: pyproject.toml
plugins: cov-5.0.0
collected 1 item

keras/src/callbacks/swap_ema_weights_test.py::SwapEMAWeightsTest::test_swap_ema_weights PASSED [100%]

====================================== 1 passed in 1.43s ======================================

I tried both with my nvidia drivers installed (550) and without.

installation steps
conda create --name keras python=3.10
conda activate keras
pip install -r requirements-common.txt -r requirements-torch-cuda.txt
pip install -e . -vv --no-deps
pip list
$ pip list
Package                      Version      Editable project location
---------------------------- ------------ -------------------------
absl-py                      2.1.0
astunparse                   1.6.3
beautifulsoup4               4.12.3
black                        24.4.2
build                        1.2.1
certifi                      2024.6.2
charset-normalizer           3.3.2
click                        8.1.7
coverage                     7.5.3
dm-tree                      0.1.8
exceptiongroup               1.2.1
filelock                     3.15.1
flake8                       7.1.0
flatbuffers                  24.3.25
fsspec                       2024.6.0
gast                         0.5.4
google                       3.0.0
google-pasta                 0.2.0
grpcio                       1.64.1
gviz-api                     1.10.0
h5py                         3.11.0
idna                         3.7
iniconfig                    2.0.0
isort                        5.13.2
jax                          0.4.28
jaxlib                       0.4.28
Jinja2                       3.1.4
keras                        3.3.3        /home/mark/git/keras
libclang                     18.1.1
Markdown                     3.6
markdown-it-py               3.0.0
MarkupSafe                   2.1.5
mccabe                       0.7.0
mdurl                        0.1.2
ml-dtypes                    0.3.2
mpmath                       1.3.0
mypy-extensions              1.0.0
namex                        0.0.8
networkx                     3.3
numpy                        1.26.4
nvidia-cublas-cu12           12.1.3.1
nvidia-cuda-cupti-cu12       12.1.105
nvidia-cuda-nvrtc-cu12       12.1.105
nvidia-cuda-runtime-cu12     12.1.105
nvidia-cudnn-cu12            8.9.2.26
nvidia-cufft-cu12            11.0.2.54
nvidia-curand-cu12           10.3.2.106
nvidia-cusolver-cu12         11.4.5.107
nvidia-cusparse-cu12         12.1.0.106
nvidia-nccl-cu12             2.20.5
nvidia-nvjitlink-cu12        12.5.40
nvidia-nvtx-cu12             12.1.105
opt-einsum                   3.3.0
optree                       0.11.0
packaging                    24.1
pandas                       2.2.2
pathspec                     0.12.1
pillow                       10.3.0
pip                          24.0
platformdirs                 4.2.2
pluggy                       1.5.0
protobuf                     4.25.3
pycodestyle                  2.12.0
pyflakes                     3.2.0
Pygments                     2.18.0
pyproject_hooks              1.1.0
pytest                       8.2.2
pytest-cov                   5.0.0
python-dateutil              2.9.0.post0
pytz                         2024.1
requests                     2.32.3
rich                         13.7.1
scipy                        1.13.1
setuptools                   70.0.0
six                          1.16.0
soupsieve                    2.5
sympy                        1.12.1
tensorboard                  2.16.2
tensorboard-data-server      0.7.2
tensorboard_plugin_profile   2.15.1
tensorflow-cpu               2.16.1
tensorflow-io-gcs-filesystem 0.37.0
termcolor                    2.4.0
tomli                        2.0.1
torch                        2.3.1+cu121
torchvision                  0.18.1+cu121
triton                       2.3.1
typing_extensions            4.12.2
tzdata                       2024.1
urllib3                      2.2.1
Werkzeug                     3.0.3
wheel                        0.43.0
wrapt                        1.16.0

@fchollet
Copy link
Member

Not sure what's going on with conda, but my hunch is that the bug with torch 2.3 + compile + keras is real, not a build issue.

@hmaarrfk
Copy link
Contributor Author

hmaarrfk commented Jun 16, 2024

Not sure what's going on with conda, but my hunch is that the bug with torch 2.3 + compile + keras is real, not a build issue.

the reproducer environment above is all installed installs all python package with with pip. I only installed python through conda.

How can i test that torch + compile is being invoked?

@fchollet
Copy link
Member

How can i test that torch + compile is being invoked?

The test does that, there's nothing in your environment that it depends on.

You can make sure that you're using the torch backend by setting the env variable KERAS_BACKEND="torch". You can double check by looking at keras.config.backend().

@hmaarrfk
Copy link
Contributor Author

Great thank you. now

KERAS_BACKEND="torch" pytest keras/src/callbacks/swap_ema_weights_test.py::SwapEMAWeightsTest::test_swap_ema_weights

correctly recreates the issue for me!

Maybe an even simpler test that fails is :

$ pytest integration_tests/torch_workflow_test.py
===================================== test session starts =====================================
platform linux -- Python 3.10.14, pytest-8.2.2, pluggy-1.5.0 -- /home/mark/miniforge3/envs/keras/bin/python3.10
cachedir: .pytest_cache
rootdir: /home/mark/git/keras
configfile: pyproject.toml
plugins: cov-5.0.0
collected 1 item

integration_tests/torch_workflow_test.py::TorchWorkflowTest::test_keras_layer_in_nn_module FAILED [100%]

========================================== FAILURES ===========================================
_______________________ TorchWorkflowTest.test_keras_layer_in_nn_module _______________________

self = <torch_workflow_test.TorchWorkflowTest testMethod=test_keras_layer_in_nn_module>

    def test_keras_layer_in_nn_module(self):
        net = Net()

        # Test using Keras layer in a nn.Module.
        # Test forward pass
        self.assertAllEqual(list(net(torch.empty(100, 10)).shape), [100, 1])
        # Test KerasVariables are added as nn.Parameter.
>       self.assertLen(list(net.parameters()), 2)

integration_tests/torch_workflow_test.py:26:
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
keras/src/testing/test_case.py:77: in assertLen
    self.assertEqual(len(iterable), expected_len, msg=msg)
E   AssertionError: 0 != 2
------------------------------------ Captured stderr call -------------------------------------
2024-06-16 20:17:51.680283: I external/local_xla/xla/stream_executor/cuda/cuda_executor.cc:998] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero. See more at https://github.com/torvalds/linux/blob/v6.0/Documentation/ABI/testing/sysfs-bus-pci#L344-L355
2024-06-16 20:17:51.680884: W tensorflow/core/common_runtime/gpu/gpu_device.cc:2251] Cannot dlopen some GPU libraries. Please make sure the missing libraries mentioned above are installed properly if you would like to use GPU. Follow the guide at https://www.tensorflow.org/install/gpu for how to download and setup the required libraries for your platform.
Skipping registering GPU devices...
=================================== short test summary info ===================================
FAILED integration_tests/torch_workflow_test.py::TorchWorkflowTest::test_keras_layer_in_nn_module - AssertionError: 0 != 2
====================================== 1 failed in 0.35s ======================================

@gbaned gbaned added this to Assigned Reviewer in PR Queue via automation Jun 17, 2024
@gbaned gbaned requested a review from fchollet June 17, 2024 04:52
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
PR Queue
Assigned Reviewer
Development

Successfully merging this pull request may close these issues.

None yet

4 participants