Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: modifying code generation to reduce bundle size #4978

Open
wants to merge 7 commits into
base: main
Choose a base branch
from

Conversation

gvwilson
Copy link
Contributor

@gvwilson gvwilson commented Jan 22, 2025

feat: modify code generation to reduce bundle size

branch format bytes %age
master .whl 14803768
codegen2 .whl 12215667 -18%
master .tar.gz 8100014
codegen2 .tar.gz 6114458 -24%
  1. Assign an empty string to the data_docs field of generated
    validators. (This has a major impact because those docs are
    duplicated many times.)

  2. Alias name of base validator during import in
    codegen/validators.py.

  3. Remove the long list of CSS colors from help strings for color
    properties.

  4. Replace super(Parent, self) with super() in generated code.

  5. Modify commands.py to run code generation.

  6. Remove comments from generated code.

  7. Replaced named arguments in constructors with positional arguments.

  8. Drop use of sys.version_info and TYPE_CHECKING. Removed the check
    for Python < 3.7 using sys.version_info and as a backup checking
    typing.TYPE_CHECKING; this saves a little space and also cleans
    up the code.

  9. Introduce _init_provided() for BaseFigure and BasePlotlyType
    that calls a helper function _initialize_provided() to replace
    repetitions of:

_v = arg.pop("something", None)
_v = something if something is not None else _v
if _v is not None:
    self["something"] = _v

Used in #5008.

@gvwilson gvwilson added P2 considered for next cycle feature something new labels Jan 22, 2025
@gvwilson gvwilson self-assigned this Jan 22, 2025
@gvwilson gvwilson force-pushed the codegen2 branch 2 times, most recently from 0c7ea5e to 2b3d47b Compare January 23, 2025 17:56
gvwilson added a commit that referenced this pull request Jan 24, 2025
Removing the check for Python < 3.7 using `sys.version_info` and as a
backup checking `typing.TYPE_CHECKING`; this saves us a little space
and also cleans up the code. Proposing this as an enhancement beyond
what's in the `codegen2` branch / PR #4978.
@gvwilson gvwilson force-pushed the codegen2 branch 2 times, most recently from 000a5e9 to dbbbaa8 Compare January 27, 2025 17:12
1.  Add `bin/get_size.py` so that `python bin/get_size.py plotly build`
    reports the number of files and total size in bytes of the `plotly`
    directory (where generated code is put) and the `build` directory
    that is populated by `python setup.py build`.

1.  Modify `codegen/__init__.py` and `./setup.py` so that
    `python setup.py --reformat=false` disables reformatting.

1.  Assign an empty string to the `data_docs` field of generated
    validators.  (This has a major impact because those docs are
    duplicated many times.)

1.  Alias name of base validator during import in
    `codegen/validators.py`.

1.  Remove the long list of CSS colors from help strings for color
    properties.

1.  Replace `super(Parent, self)` with `super()` in generated code.

1.  Drop use of sys.version_info and TYPE_CHECKING.  Removed the check
    for Python < 3.7 using `sys.version_info` and as a backup checking
    `typing.TYPE_CHECKING`; this saves a little space and also cleans
    up the code.

1.  Remove mention of Chart Studio and explicit enumeration of system
    font names from plotly.js / plot-schema.json so that this text
    isn't copied dozens of times into the plotly.py bundle.

1.  Introduce `_init_provided()` for `BaseFigure` and `BasePlotlyType`
    that calls a helper function `_initialize_provided()` to replace
    repetitions of:

```
_v = arg.pop("something", None)
_v = something if something is not None else _v
if _v is not None:
    self["something"] = _v
```

Original size of plotly/**/*.py: 42283582 bytes
Current size of plotly/**/*.py:  31931739 bytes
Change: -25%
1.  Modify `commands.py` to run code generation.
1.  Remove comments from generated code.
1.  Replaced named arguments in constructors with positional arguments.
1.  Regenerate all code.

Notes:

The generated code is reformatted once again: this slightly increases
size but makes it more readable.

There is one flaky test:

`tests/test_plotly_utils/validators/test_colorscale_validator.py::test_acceptance_named[Inferno_r]`

It fails when the entire test suite is run but does *not* fail when
only `test_colorscale_validator.py` is run (using Python 3.11.9).

| branch   | format  | bytes    | %age |
| -------- | ------- | -------- | ---- |
| master   | .whl    | 14803768 |      |
| codegen2 | .whl    | 12215667 | -18% |
| master   | .tar.gz |  8100014 |      |
| codegen2 | .tar.gz |  6114458 | -24% |
@gvwilson
Copy link
Contributor Author

see also #4951

1.  Update required version of `black` in `pyproject.toml` and `.circleci/config.yml`.
2.  Update Python version identifiers in `tool.black` section of `pyproject.toml` to include `py311` and `py312`.
3.  Modify invocation of `black` in `commands.py` to format with `py311`.
4.  Regenerate and reformat code.
@emilykl
Copy link
Contributor

emilykl commented Apr 3, 2025

@gvwilson High-level question -- what is the practical impact of some of these changes on the docs / docstrings, for example, removing the list of CSS colors or the list of dict properties?

Completely agree that having those large chunks of text repeated many times over is a very poor use of bundle bytes. But are there cases where it makes the docstrings less useful, and if so are there any changes we need to make in the docs to mitigate the impact?

@emilykl
Copy link
Contributor

emilykl commented Apr 3, 2025

@gvwilson Could we add a single line at the top of all codegen files saying # THIS FILE IS AUTO-GENERATED?

I realize that runs exactly opposite to the goal of reducing bundle size, but it would make development a lot easier to have all code-generated files identified in the file itself.

(Could be a separate PR.)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is this file showing up in the diff? Bad merge? Should probably just defer to whatever version of this file is on main.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

agreed.

@emilykl
Copy link
Contributor

emilykl commented Apr 3, 2025

@gvwilson Looks good, see my comments for some questions I have, but none of them are blockers to merging — except possibly the removal of the TYPE_CHECKING if clause, which I don't fully understand.

@gvwilson
Copy link
Contributor Author

gvwilson commented Apr 7, 2025

@gvwilson High-level question -- what is the practical impact of some of these changes on the docs / docstrings, for example, removing the list of CSS colors or the list of dict properties?

Completely agree that having those large chunks of text repeated many times over is a very poor use of bundle bytes. But are there cases where it makes the docstrings less useful, and if so are there any changes we need to make in the docs to mitigate the impact?

@emilykl agreed - we will need to provide a canonical list of colors and fonts. @LiamConnors would you like to figure out where you'd like to put this and then I can add the URL into plotly.js (to be copied into plotly.py by codegen) that can stay in the docstring?

@gvwilson
Copy link
Contributor Author

gvwilson commented Apr 7, 2025

@gvwilson Could we add a single line at the top of all codegen files saying # THIS FILE IS AUTO-GENERATED?

Excellent idea - it will only add a few bytes per file but will save a lot of grief.

- Add caveat to the top of every machine-generated file to warn people
  not to edit them.
- Make name of initial value property setter more readable.
- Run black formatting with multiple Python versions.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature something new P2 considered for next cycle
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants