Skip to content
This repository was archived by the owner on Nov 17, 2025. It is now read-only.

Updated debugprint to print variable name for leaf nodes#720

Draft
oscargus wants to merge 1 commit intoaesara-devs:mainfrom
oscargus:newconstantprint
Draft

Updated debugprint to print variable name for leaf nodes#720
oscargus wants to merge 1 commit intoaesara-devs:mainfrom
oscargus:newconstantprint

Conversation

@oscargus
Copy link
Copy Markdown
Contributor

@oscargus oscargus commented Jan 2, 2022

Better approach than in #719. Now debugprint use the newly introduced
_debug_str property (fallback to str). This is also used if name
is not set.

Now the output is as:

In [3]: z = at.constant(1, name='z')
   ...: y = z + 1; 
   ...: y.name = 'y'

In [4]: dprint(y)
Elemwise{add,no_inplace} [id A] 'y'   
 |TensorConstant{1} [id B] 'z'
 |TensorConstant{1} [id C] ''

In [5]: z
Out[5]: z

In [6]: y
Out[6]: y

Is this more like you expected @ricardoV94 ?

Thank you for opening a PR!

Here are a few important guidelines and requirements to check before your PR can be merged:

  • There is an informative high-level description of the changes.
  • The description and/or commit message(s) references the relevant GitHub issue(s).
  • pre-commit is installed and set up.
  • The commit messages follow these guidelines.
  • The commits correspond to relevant logical changes, and there are no commits that fix changes introduced by other commits in the same branch/BR.
  • There are tests covering the changes introduced in the PR.

Don't worry, your PR doesn't need to be in perfect order to submit it. As development progresses and/or reviewers request changes, you can always rewrite the history of your feature/PR branches.

If your PR is an ongoing effort and you would like to involve us in the process, simply make it a draft PR.

@oscargus oscargus changed the title Changed str-printing of Variable. Updated debugprint to print variable name for leaf nodes Jan 2, 2022
Better approach than in aesara-devs#719. Now debugprint use the newly introduced
`_debug_str` property (fallback to `str`). This is also used if `name`
is not set.
@brandonwillard
Copy link
Copy Markdown
Member

Unfortunately, we still have some old Theano tests that use the debugprint output to confirm the form of graphs during testing. It's an unnecessary dependency that we need to remove, and it's what's causing these errors.

Regardless, these changes are bigger than #689 and would need to be considered as an entirely distinct refactoring of the debugprint format. As it concerns #719, I don't think there are any real issues (see my comment here), so it doesn't appear to need a follow up.

We can still consider making these changes, but we should change this PR to a draft and start going over the pros/cons of such a change.

@oscargus
Copy link
Copy Markdown
Contributor Author

oscargus commented Jan 2, 2022

Looking at the failed test, it seems like the previous approach was actually in line with what was already these. One may consider scrapping the value though.

@oscargus oscargus marked this pull request as draft January 2, 2022 19:33
@ricardoV94
Copy link
Copy Markdown
Contributor

I see the parallel with input variables, and in that case the previous PR is just fine.

As an enhancement I find the output after this PR more readable, we get the fact that it is a constant, the value, and the optional name.

Input variables could likely also be made more readable with the same sort of changes done here for the constants, by showing the type (scalar, vector, matrix, etc) and the optional name after the id.

I am not considering implementation/ maintenance challenges though. What do you guys think?

@brandonwillard brandonwillard added refactor This issue involves refactoring request discussion labels Jan 3, 2022
@brandonwillard
Copy link
Copy Markdown
Member

brandonwillard commented Jan 4, 2022

I see the parallel with input variables, and in that case the previous PR is just fine.

As an enhancement I find the output after this PR more readable, we get the fact that it is a constant, the value, and the optional name.

Input variables could likely also be made more readable with the same sort of changes done here for the constants, by showing the type (scalar, vector, matrix, etc) and the optional name after the id.

I am not considering implementation/ maintenance challenges though. What do you guys think?

It has a kind of uniformity I can appreciate, so I'm not against it; however, it seems like we should go all the way with the uniformity and not even use any Op-level settings (e.g. __str__ and the like). Instead, we can make _debugprint itself render the same type of line information for every Op (e.g. class name, __props__ in curly brackets, etc.).

This approach also works much better with our goal of simplifying the core Aesara objects, since that mostly involves the removal of external information from the instances and types (e.g. ultimately removing all the C compilation-related methods and members).

@ricardoV94
Copy link
Copy Markdown
Contributor

I like that idea yes. @oscargus is this something you would be interested in trying in this PR? If not feel free to close it and we'll open an issue to keep track of it

@oscargus
Copy link
Copy Markdown
Contributor Author

oscargus commented Jan 4, 2022

I've checked the code base a bit and realized that the usage of __str__ and __repr__ is not always how it is recommended. There is also (at least) one other issue related to that: #559

Basically, __repr__ should return a description for recreating the object and __str__ is used if a "simpler"/"better looking" string is required. Now, it seems like __str__ returns something with curly brackets and __repr__ sometimes use __str__, while, if anything, it should be the other way around (which I also think is the default, if __str__ doesn't exist, it uses __repr__).

The main question is of course how debugprint should work, but also I think it is useful, in general, to actually return a valid __repr__, at least in the long run. So my suggestion is something like this:

  • Make __repr__ return a string that can be used to recreate the object (although with __str__ of the contained objects as this can lead to infinite recursions otherwise...), i.e., valid Python code.
  • Use .name for __str__ if it exists, otherwise fall back to __repr__ (here one may have to check that there cannot be any infinite recursions, but I do not know if it is possible to introduce cycles? Consider something like a = {}, b = {'a': a}, a['b'] = a which wouldn't print if the actual content of the contained object was printed in detail.)
  • Introduce _debug_str which is used in debugprint to get nicely styled info (curly brackets etc), basically reusing the current __str__/__repr__.

If you agree on this, I can definitely start working on it.

Edit: maybe one should fallback to _debug_str for __str__ if it exists as this probably looks a bit more readable etc. And will not have the cyclic issues.

@brandonwillard
Copy link
Copy Markdown
Member

Basically, __repr__ should return a description for recreating the object and __str__ is used if a "simpler"/"better looking" string is required. Now, it seems like __str__ returns something with curly brackets and __repr__ sometimes use __str__, while, if anything, it should be the other way around (which I also think is the default, if __str__ doesn't exist, it uses __repr__).

Yes, this is something that has bothered me for a while. It's even come up in a completely different scenario.

We do need to be careful here, though, because—given the nature of our objects (i.e. graphs)—there's the potential for a true "readable" __repr__ to become unwieldy. For example, the __repr__s in symbolic-pymc would become too large and unreadable, and this made interactive debugging rather difficult. It was possible to compensate using IPython's pretty printer mechanisms and the like, but those conveniences could only be afforded in certain circumstances (e.g. when using IPython or making changes to the interactive sessions).

  • Make __repr__ return a string that can be used to recreate the object (although with __str__ of the contained objects as this can lead to infinite recursions otherwise...), i.e., valid Python code.

I'm generally for this, with awareness of the issue mentioned above, of course.

  • Use .name for __str__ if it exists, otherwise fall back to __repr__ (here one may have to check that there cannot be any infinite recursions, but I do not know if it is possible to introduce cycles? Consider something like a = {}, b = {'a': a}, a['b'] = a which wouldn't print if the actual content of the contained object was printed in detail.)

This is fine, and there shouldn't be any issues with cycles. One technically can force a cycle into a graph, but that would be a rather odd intentional effort, so there's really no need to worry about it.

  • Introduce _debug_str which is used in debugprint to get nicely styled info (curly brackets etc), basically reusing the current __str__/__repr__.

As I mentioned above, it's best that we make the output format of debugprint as uniform as possible and entirely contained within itself. In other words, we probably don't want the classes themselves determining how they're printed, which a _debug_str-like method would imply.

We can use the format already defined by the current generic __str__ format, and just move it into debugprint. I think debugprint might already (re)construct those types of strings in certain cases, so we might only need to make that the only form of string construction that's used.

@oscargus
Copy link
Copy Markdown
Contributor Author

oscargus commented Jan 6, 2022

we probably don't want the classes themselves determining how they're printed, which a _debug_str-like method would imply.

I think that could be a quite nice approach though. Basically because classes could have quite different properties, so different classes may want to print differently. At first glance, I cannot really see any other option than an "if isinstance(expr, class)"-list and from that perspective, I believe it is not really a good solution. Also, adding a new class is either adding a well defined method to the class or trying to figure out where in the hierarchy (considering ordering, where parent classes must be after subclasses in the if-list), is more tricky. But I'll have a check of the actual code.

(I'm also probably biased from the way that SymPy does it, which I've worked with quite a bit and find to work neatly. On the other hand, it may be that there spread is much larger there when it comes to possible outcomes.)

@brandonwillard
Copy link
Copy Markdown
Member

I think that could be a quite nice approach though. Basically because classes could have quite different properties

All the relevant properties should be available via Op.__props__, when they're not, then it's something that needs to be fixed. (NB: We haven't refactored enough to use something like __slots__ to automatically enforce this strict "listed" properties requirement, but it's on the roadmap: #421.)

In other words, we necessarily should be able to implement a uniform debug printing format without using the graph objects (e.g. Op, Type, Apply, etc.) themselves to produce/define said format.

In general, this library is trying to move toward simpler types and a clearer separation between type/class definitions and functionality that depends on them.

You can see this in our (ongoing) move away from the old C backend that promoted a mess of specialized c_* methods and members. That approach added too much unnecessary coupling between the C transpilation process and the graph objects, which caused logic and concerns to bleed in from one interface to the other, making it extremely difficult to refactor, understand, etc. It's a common pitfall, and one we are both recovering from and trying to avoid.

When possible, we prefer to use generic functions/singledispatch, or any approach that keeps functionality relatively self-contained and unlikely to unnecessarily impose upon the objects with which it operates.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

refactor This issue involves refactoring request discussion

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants