Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: string subclass overriding eq #15735

Open
wants to merge 6 commits into
base: master
Choose a base branch
from

Conversation

nucccc
Copy link

@nucccc nucccc commented Jul 21, 2023

Fixes mypyc/mypyc#973

In order to avoid problem when subclasses inheriting from string and overriding __eq__ I applied the code from @ichard26 :

int CPyStr_Compare(PyObject *left, PyObject *right) {
    if (likely(PyUnicode_CheckExact(left) && PyUnicode_CheckExact(right))) {
        return PyUnicode_Compare(left, right);
    } else {
        int ret = PyObject_RichCompareBool(left, right, Py_EQ);
        if (ret == 1)
            return 0;
        if (ret == 0)
            return 1;
        return -1;
    }
}

This is a draft pull request. Main problem is that I has to create a special case in mypyc/irbuild/prepare.py to allow for a class inheriting from a string:

for cls in info.mro:
        # Special case exceptions and dicts
        # XXX: How do we handle *other* things??
        if cls.fullname == "builtins.BaseException":
            ir.builtin_base = "PyBaseExceptionObject"
        elif cls.fullname == "builtins.dict":
            ir.builtin_base = "PyDictObject"
        elif cls.fullname == "builtins.str":
            pass
        elif cls.fullname.startswith("builtins."):
            if not can_subclass_builtin(cls.fullname):
                # Note that if we try to subclass a C extension class that
                # isn't in builtins, bad things will happen and we won't
                # catch it here! But this should catch a lot of the most
                # common pitfalls.
                errors.error(
                    "Inheriting from most builtin types is unimplemented", path, cdef.line
                )

otherwise the error "Inheriting from most builtin types is unimplemented" would have occurred. That's why I added

elif cls.fullname == "builtins.str":
    pass

That's why I think it's worth to evaluate if this isn't too much a of a "risky" thing in the codebase.

If everything makes sense, I would keep on going by adding some minor documentation.

A basic test has been added.

Thanks for the attention.

@nucccc nucccc force-pushed the fix/overriding_string_subclass_eq branch from d265d0e to b1701a7 Compare July 21, 2023 17:47
@ichard26 ichard26 self-assigned this Jul 21, 2023
@ichard26
Copy link
Collaborator

ichard26 commented Aug 2, 2023

Heya, I was going to review this pull request, but it fell off my radar. Sorry about that!

The test should be using an interpreted str subclass, ie. one that is not compiled (take a closer look at the original issue):

[case testCustomCompare]
from interp import S

def test_custom_compare() -> None:
    s: str = S("x")
    assert not s == "x"
    assert not "x" == s

[file interp.py]
class S(str):
    def __eq__(self, other: object) -> bool:
        return False

I couldn't get it to work properly with != as that actually uses different dunders. I don't fully understand it myself so I don't feel like explaining the Python data model which gets... involved here, but the TL;DR is that != calls __ne__ which doesn't exist on the subclass so Python falls back to callling str.__ne__ which doesn't respect subclasses. FWIW, it seems like overriding __ne__ in the subclass is also broken. Not sure if that is already fixed by this patch or not.

@nucccc nucccc changed the title draft: attempt to fix string subclass overriding eq attempt to fix string subclass overriding eq Aug 7, 2023
@nucccc nucccc marked this pull request as ready for review August 7, 2023 21:05
@nucccc
Copy link
Author

nucccc commented Aug 7, 2023

Thank you very much for pointing me out.

I managed to write a dedicated function for ne comparison. One is the C function CPyStr_CompareNeq in mypyc/lib-rt/str_ops.c, and there is a corresponding unicode_compare_neq in mypyc/primitives/str_ops.py.

Then, in the compare_strings method of the LowLevelIRBuilder in mypyc/irbuild/ll_builder.py the two c functions are selected based on the operator used. In that if statement I tried to leave a brief comment, I hope it can be of help, I feel that someone with a better comprehension of the codebase can improve such comment.

I also did my best to improve the test data, with the folloqing two classes:

[file interp.py]
class StrEqualNever(str):
    def __eq__(self, other: object) -> bool:
        return False

    def __ne__(self, other: object) -> bool:
        return True

class StrEqualAlways(str):
    def __eq__(self, other: object) -> bool:
        return True

    def __ne__(self, other: object) -> bool:
        return False

I import them from a different file as you stated, I'm sorry I misunderstood that aspect of the issue.

I'm available for improvements, I hope this can be useful.

@nucccc nucccc changed the title attempt to fix string subclass overriding eq fix: string subclass overriding eq Aug 8, 2023
Copy link
Collaborator

@ichard26 ichard26 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure whether @JukkaL would want to see this optimized more, but functionally this looks great.

Thanks for contributing a PR @nucccc!

mypyc/test-data/run-strings.test Show resolved Hide resolved
mypyc/test-data/run-strings.test Outdated Show resolved Hide resolved
mypyc/lib-rt/str_ops.c Outdated Show resolved Hide resolved
mypyc/irbuild/ll_builder.py Outdated Show resolved Hide resolved
@nucccc
Copy link
Author

nucccc commented Aug 17, 2023

Thank you very much @ichard26 for the review! I applied your modifications.

Ragarding optimization, I would say that the check on likelyness could be an overhead.

    if (likely(PyUnicode_CheckExact(left) && PyUnicode_CheckExact(right))) {
        return PyUnicode_Compare(left, right);
    }

I don't know if there is any way/necessity to let the user specify a parameter to avoid such check (thus resorting to the "risky" version in which no check on the real class is done). In case there was a chance to let such parameter be passed during code generation, that could become some compiler directives to avoid the generation of such if, maybe. I leave this idea as a pour parler, the only options I found were in mypyc/options.py.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Overriding __eq__ in str subclass not supported
2 participants