-
Notifications
You must be signed in to change notification settings - Fork 230
Add comprehensive style guide for cuda/core/experimental #1316
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
|
Link to rendered document: style-guide.md |
|
I'm interested in feedback on this idea. With a comprehensive style guide, aside from the other benefits listed in the description, we can use LLMs to help refactor, clean, and document the code. If the team is comfortable with this approach, I can begin to apply rules conservatively to the code base. |
6d4003f to
193330f
Compare
This commit adds a complete style guide covering conventions for Python and Cython code in cuda/core/experimental. The guide includes: - File structure and organization (SPDX headers, imports, __all__, ordering) - Package layout (.pyx/.pxd/.py files, subpackage patterns) - Import statement organization (5 groups with alphabetical sorting) - Class and function definition ordering (dunder methods, methods, properties) - Naming conventions (PascalCase, snake_case, UPPER_SNAKE_CASE) - Type annotations (PEP 604 union syntax, forward references) - Docstrings (NumPy style with comprehensive examples) - Error handling and warnings (custom exceptions, stacklevel, one-time warnings) - Memory management (resource lifecycle, cleanup patterns) - Thread safety and concurrency (locks, thread-local storage) - Cython-specific features (cdef/cpdef/def, nogil, inline functions) - Constants and magic numbers (naming, CUDA constants) - Comments and inline documentation (TODO, NOTE patterns) - Code organization within files - Performance considerations (GIL management, C types) - API design principles (public vs private, backward compatibility) - CUDA-specific patterns (GIL management for driver API calls) - Copyright and licensing (SPDX format) The guide follows PEP 8 as the base and promotes modern Python practices (PEP 604, PEP 563) while documenting current codebase patterns.
193330f to
8f6f10b
Compare
|
Will look more closely asap. Drive-by comment: I was surprised to see the new file under |
I thought about that, too. I agree it belongs at the root. |
Document the two-phase development approach: - Phase 1: Start with Python driver implementation and tests - Phase 2: Optimize by switching to cydriver with nogil blocks Includes step-by-step conversion guide and before/after examples.
Do we want this in the root? I'd expect it within a docs folder. That shouldn't impact an LLMs ability to discover and utilize it. |
cpcloud
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These aren't blocking, but I have a few concerns:
- I am a bit concerned about the team all using various agents with this an getting a variety of different results, with no way to actually enforce any of these guidelines without the human effort of checking the guide, and comparing a give changeset to it.
- Given the extensive guidelines, I think this will actually add human time for: 1) waiting for the LLM to take action, 2) babysitting it when it, as expected, doesn't follow the guidelines in an acceptable-to-the-author way.
|
|
||
| Files in `cuda/core/experimental` must follow a consistent structure. The ordering of elements within a file is as follows: | ||
|
|
||
| ### 1. SPDX Copyright Header |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm -1 on including things in this guide that can be automated, and I realize that results in a lack of completeness.
In fact, I'd rather our lint rule for this particular thing just add it to the top of the file when it fails.
I think this can be done for all file types, including pyx files.
|
|
||
| ### 3. `__all__` Declaration | ||
|
|
||
| If the module exports public API elements, include an `__all__` list after the imports and before any other code. This explicitly defines the public API of the module. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is slightly misleading, since __all__ is about what happens when you star-import a module.
+1 on defining this, but stating it defines the public API I would remove from the description.
| LEGACY_DEFAULT_STREAM = C_LEGACY_DEFAULT_STREAM | ||
| ``` | ||
|
|
||
| ### 5. Principal Class or Function |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm a bit mixed on enforcing ordering of elements here. I do think it can help cut review time, but in practice I find that this kind of strict ordering only starts mattering when files get huge, to which I would probably say "Split the file up" 😆
| - Not every file will have all sections. For example, a utility module may not have a principal class. | ||
| - The distinction between "principal" and "other" classes is based on the file's primary purpose. If a file exists primarily to define one class, that class is the principal class. | ||
| - Private implementation functions should be placed at the end of the file to keep the public API visible at the top. | ||
| - **Within each section**, classes and functions should be sorted alphabetically by name. The principal class or function is an exception to this rule, as it appears first in its respective section. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure this should be enforced without a linter. It feels a bit draconian to me to require someone to reorder their methods.
For example, I like to order methods by "relevance radius", which is a squishy concept that is something like "methods that are related to each other are close to each other". I find this makes debugging way easier, if only because I can see more related code in a single vim window.
|
|
||
| 2. **Keep `.pxd` files minimal**: Only include declarations needed for Cython compilation. Omit implementation details, docstrings, and Python-only code. | ||
|
|
||
| 3. **Use `__all__` in submodules**: Each submodule should define `__all__` to explicitly declare its public API. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The public API bit seems to be repeated in a bunch of places, and it's misleading. That said, I'm unsure of a better way to phrase it, but perhaps there's a slightly less "sure" way of stating it?
How about just "Each submodule should define __all__"?
|
|
||
| ### Implementation Comments | ||
|
|
||
| Add comments to explain complex logic or non-obvious behavior: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I really question whether this is useful for humans. I suppose it may be useful for an LLM 🤷🏻
|
|
||
| ### Prefer `cdef` for Internal Functions | ||
|
|
||
| Use `cdef` functions for internal operations that don't need to be callable from Python: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Isn't this already specified above?
| result = [] | ||
| for i in range(n): | ||
| result.append(i) | ||
|
|
||
| # Preferred: Use C array or pre-allocate | ||
| cdef int* c_result = <int*>malloc(n * sizeof(int)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seeing as how these are not equivalent, it seems odd to say "prefer X over Y" when X and Y are very different. Sometimes you want a list, sometimes you don't need it and can use a C array.
| Use naming conventions to distinguish public and private APIs: | ||
|
|
||
| - **Public API**: No leading underscore, documented in docstrings, included in `__all__` | ||
| - **Private API**: Leading underscore (`_`), may have minimal documentation, not in `__all__` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since __all__ only affects star import behavior, this is redundant, since leading underscore names are not included in star imports.
|
|
||
| ### Guidelines | ||
|
|
||
| 1. **Placement**: The copyright header must be the first lines of the file, before any imports or other code. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Isn't this covered by the "all files must include a copyright header"?
mdboom
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is a defacto standard for where to put this kind of stuff and tell the IDE where and when to apply it.
| - **Abstract base classes**: ABCs that define interfaces (e.g., `MemoryResource` in `_buffer.pyx`) | ||
| - **Other public classes**: Additional classes exported by the module | ||
|
|
||
| **All classes and functions in this section should be sorted alphabetically by name**, regardless of their relationship to the principal class. The principal class appears first as an exception to this rule. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
-1 on alphabetical ordering. I think we should reflect the logical relationships between classes instead (which I understand is a bit mushy as a concept). It's easy enough in an IDE to jump to the one you are looking for using search.
|
|
||
| - Functions with names starting with `_` (private) | ||
| - `cdef inline` functions used for internal implementation | ||
| - Helper functions not part of the public API |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I suppose this is fine, but seems backward to me, given that languages like C have always required forward declarations to do things like this, code tends to be ordered from low-level to high-level, with the "principal public thing" at the bottom, not the top. I realize Python has never had this problem, but it does seems reversed from almost anything else.
| The `cuda/core/experimental` package uses three types of files: | ||
|
|
||
| 1. **`.pyx` files**: Cython implementation files containing the actual code | ||
| 2. **`.pxd` files**: Cython declaration files containing type definitions and function signatures for C-level access |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(For the future): We really should be requiring that all .pyx files have a corresponding .pyi file. Without it, modern IDEs (which don't actually import modules) can not perform auto completion and cross referencing. But we can add that later as an effort to improve the IDE experience.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In 2025, is there a way we can auto-generate .pyi from .py/.pxd/.pyx? If not, let's not make it a must.
One thing I hate the most is Python typing. Does not make any sense to me to maintain 2-3 files.
| For each `.pyx` file that defines classes or functions used by other Cython modules, create a corresponding `.pxd` file: | ||
|
|
||
| - **`.pxd` file**: Contains `cdef` class declarations, `cdef`/`cpdef` function signatures, and `cdef` attribute declarations | ||
| - **`.pyx` file**: Contains the full implementation including Python methods, docstrings, and implementation details |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(As above): The better way is to include docstrings and type annotations in a .pyi file and just implementation in the .pyx file.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think Cython would honor .pyi files? To the very least, I don't think it is friendly to write .pyi files when the types are needed to complete a cdef function?
|
|
||
| 6. **Separate concerns**: Use `.py` files for pure Python utilities, `.pyx` files for Cython implementations that need C-level performance. | ||
|
|
||
| ## Import Statements |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If I were writing this for a human, I would take this section (and some other sections below that are basically "PEP8") and just link to them so we aren't maintaining duplication of community standards. If the LLM can handle something like that, I think we should do that here. Given "agents", shouldn't they just be able to run ruff?
|
|
||
| #### Cython `cdef` Variables | ||
|
|
||
| Consider prefixing `cdef` variables with `c_` to distinguish them from Python variables. This improves code readability by making it clear which variables are C-level types. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
|
|
||
| ```python | ||
| LEGACY_DEFAULT_STREAM = C_LEGACY_DEFAULT_STREAM | ||
| PER_THREAD_DEFAULT_STREAM = C_PER_THREAD_DEFAULT_STREAM |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Additionally, related constants should be grouped into enums.
| def __dealloc__(self): | ||
| """Automatic cleanup when object is garbage collected.""" | ||
| DMR_close(self) | ||
| ``` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we recommend that all such objects that require cleanup define a context manager, i.e. __enter__ and __exit__ methods?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not for now. I can write a section explaining the current rationale in a separate PR if it is OK.
|
|
||
| 5. **Use stream-ordered deallocation**: When deallocating buffers, use the appropriate stream for asynchronous cleanup to avoid blocking operations. | ||
|
|
||
| 6. **Track resource ownership**: Clearly document which objects own CUDA handles and are responsible for cleanup. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can't we do better than just documenting, by specifying all objects contain a strong reference to their owner to prevent use-after-free errors etc.?
EDIT: Though maybe for a style guide this is too complex of an issue. For style, documenting that such a situation exists is the first requirement for actually patching these sort of holes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
FWIW, some of the things I'd like to document would more properly belong in a "Developer's Guide." I am therefore considering whether to expand the scope of this document. I'd restructure it into sections where "Style" is one section.
|
|
||
| ### Inline Functions | ||
|
|
||
| Use `inline` for small, frequently-called functions: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is for Cython, which is not a modern C compiler ;) IME, sometimes the C compiler can see through it and make the right choice and other times not. The salient point for me is that we would ideally backup the choice of inline with a benchmark.
|
Merging my offline comment to here. My 2c:
|
Summary
This PR adds a comprehensive style guide for Python and Cython code in
cuda/core/experimental. The guide establishes clear conventions and best practices to maintain code quality and consistency across the codebase.What's Included
The style guide covers 18 major areas:
__all__, classes, functions).pyx,.pxd, and.pyfiles, subpackage patternsX | Y | None), forward referencescdef/cpdef/def,nogil, inline functionsKey Principles
X | Yunion syntax) and PEP 563 (from __future__ import annotations)Benefits
File Added
cuda/core/style-guide.md(1,818 lines)Next Steps
After merge, this guide can be: