Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

update only changed fields. see issue #2 for details #45

Open
wants to merge 4 commits into
base: master
Choose a base branch
from

Conversation

CastixGitHub
Copy link
Contributor

No description provided.

@CastixGitHub
Copy link
Contributor Author

build is failing due to codecov

@codecov
Copy link

codecov bot commented Mar 18, 2022

Codecov Report

Merging #45 (795874d) into master (b22af7c) will increase coverage by 0.13%.
The diff coverage is 100.00%.

@@            Coverage Diff             @@
##           master      #45      +/-   ##
==========================================
+ Coverage   89.96%   90.10%   +0.13%     
==========================================
  Files          43       43              
  Lines        6636     6718      +82     
==========================================
+ Hits         5970     6053      +83     
+ Misses        666      665       -1     
Flag Coverage Δ
tests-3.5 89.76% <100.00%> (+0.14%) ⬆️
tests-3.6 89.97% <100.00%> (+0.14%) ⬆️
tests-3.7 89.97% <100.00%> (+0.14%) ⬆️
tests-3.8 90.11% <100.00%> (+0.13%) ⬆️
tests-3.9 90.11% <100.00%> (+0.13%) ⬆️
tests-pypy3 89.97% <100.00%> (+0.14%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
ming/odm/mapper.py 89.33% <100.00%> (+0.25%) ⬆️
ming/session.py 89.02% <100.00%> (+0.20%) ⬆️
ming/tests/odm/test_declarative.py 99.58% <100.00%> (+0.03%) ⬆️
ming/tests/odm/test_mapper.py 100.00% <100.00%> (ø)
ming/tests/test_session.py 99.30% <100.00%> (ø)
ming/utils.py 74.13% <100.00%> (+1.91%) ⬆️
ming/odm/odmsession.py 68.48% <0.00%> (+0.32%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update b22af7c...795874d. Read the comment docs.

@CastixGitHub
Copy link
Contributor Author

see #2

@brondsem
Copy link
Collaborator

brondsem commented Apr 1, 2022

Hi,

I haven't had a chance to look at this yet - but wanted to say I have seen it and it looks quite interesting.

Just a little update to let you know your contribution is appreciated and seen. I hope to take a serious look at it before too much longer.

@brondsem
Copy link
Collaborator

Hi, thanks for working on this and for your patience.

Could you add a simple test that uses replace_session? At least enough to get some test coverage would be nice.

I ran these changes against a large app that uses Ming (Apache Allura, which I work on). I found two issues:

  1. When a model has a before_save hook which modifies data then that field will get missed. (This was discovered by Allura's forgetracker.tests.functional.test_root:TestFunctionalController.test_touch test) Here's a little code to illustrate what triggers it:
class MyThing:
    class __mongometa__:
        ...
        def before_save(data):
            data['mod_date'] = datetime.utcnow()

We should probably add a Ming test that has this kind of behavior.
And then how to fix it? I'm not sure what a nice solution would be. Your new fields logic runs before session.impl.save which is what calls self._prep_save and the hooks. So we'd have to somehow move the new logic after self._prep_save, or some other moving or passing of vars. Maybe before_save hooks should return a list of fields they modified, and then include that in the save fields? It would be a breaking change and require existing before_save hooks to change, but maybe that'd be the cleanest.

  1. is related to new objects. I haven't debugged it completely, it might have something to do with _id field being special and automatically populated on new objects by pymongo, and ming is skipping it? Anyway I can fix the Allura tests affected by adding an if state.original_document. Its probably better this way anyway: only do the new fields logic if its an update, and new objects work the previous way (saving all the fields):
            if state.original_document:
                # here we do a symmetric difference to see what fields did change
                fields = tuple(set((k for k, v in
                                    doc_to_set(state.original_document)
                                    ^ doc_to_set(state.document))))
            else:
                fields = ()

@CastixGitHub
Copy link
Contributor Author

state.options isn't documented
"the ming foundation layer's save args parameter" is undocumented too
so adding the state parameter, and moving the fields logic there, then why do we take args as a parameter if we are now detecting what changed?
this is a breaking change, and a test was in place.
do you think is there a better approach?
can you re-run with allura?
Thank you for reviewing!

@brondsem
Copy link
Collaborator

I think this is a good approach to dealing with it.

I get this test error, not sure why the CI tests didn't error on it:

======================================================================
ERROR: test_writable_backref (ming.tests.odm.test_declarative.TestRealMongoRelation)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/Users/brondsem/sf/ming/ming/tests/odm/test_declarative.py", line 191, in test_writable_backref
    self.session.flush()
  File "/Users/brondsem/sf/ming/ming/odm/base.py", line 35, in inner
    result = func(obj, *args, **kwargs)
  File "/Users/brondsem/sf/ming/ming/odm/odmsession.py", line 99, in flush
    self.uow.flush()
  File "/Users/brondsem/sf/ming/ming/odm/unit_of_work.py", line 49, in flush
    unow(obj, st)
  File "/Users/brondsem/sf/ming/ming/odm/base.py", line 35, in inner
    result = func(obj, *args, **kwargs)
  File "/Users/brondsem/sf/ming/ming/odm/odmsession.py", line 115, in update_now
    mapper(obj).update(obj, st, self, **kwargs)
  File "/Users/brondsem/sf/ming/ming/odm/base.py", line 35, in inner
    result = func(obj, *args, **kwargs)
  File "/Users/brondsem/sf/ming/ming/odm/mapper.py", line 88, in update
    ret = session.impl.save(doc, validate=False, state=state)
  File "/Users/brondsem/sf/ming/ming/session.py", line 23, in wrapper
    return func(self, doc, *args, **kwargs)
  File "/Users/brondsem/sf/ming/ming/session.py", line 165, in save
    result = self._impl(doc).update(
  File "/Users/brondsem/sf/ming/.tox/py39/lib/python3.9/site-packages/pymongo/collection.py", line 3325, in update
    return self._update_retryable(
  File "/Users/brondsem/sf/ming/.tox/py39/lib/python3.9/site-packages/pymongo/collection.py", line 868, in _update_retryable
    return self.__database.client._retryable_write(
  File "/Users/brondsem/sf/ming/.tox/py39/lib/python3.9/site-packages/pymongo/mongo_client.py", line 1498, in _retryable_write
    return self._retry_with_session(retryable, func, s, None)
  File "/Users/brondsem/sf/ming/.tox/py39/lib/python3.9/site-packages/pymongo/mongo_client.py", line 1384, in _retry_with_session
    return self._retry_internal(retryable, func, session, bulk)
  File "/Users/brondsem/sf/ming/.tox/py39/lib/python3.9/site-packages/pymongo/mongo_client.py", line 1416, in _retry_internal
    return func(session, sock_info, retryable)
  File "/Users/brondsem/sf/ming/.tox/py39/lib/python3.9/site-packages/pymongo/collection.py", line 860, in _update
    return self._update(
  File "/Users/brondsem/sf/ming/.tox/py39/lib/python3.9/site-packages/pymongo/collection.py", line 837, in _update
    _check_write_command_response(result)
  File "/Users/brondsem/sf/ming/.tox/py39/lib/python3.9/site-packages/pymongo/helpers.py", line 226, in _check_write_command_response
    _raise_last_write_error(write_errors)
  File "/Users/brondsem/sf/ming/.tox/py39/lib/python3.9/site-packages/pymongo/helpers.py", line 208, in _raise_last_write_error
    raise WriteError(error.get("errmsg"), error.get("code"), error)
pymongo.errors.WriteError: ('\'$set\' is empty. You must specify a field like so: {$set: {<field>: ...}}, full error: {\'index\': 0, \'code\': 9, \'errmsg\': "\'$set\' is empty. You must specify a field like so: {$set: {<field>: ...}}"}', "doc:  {'_id': 0, 'parent_id': 1}")

I would like to keep the save() behavior that lets a list of fields be passed in. Even though its not documented, there was a test for it and I know I have some code that uses it. I think the method could work like this. It also fixes the above error if the args/fields ends up empty.

    @annotate_doc_failure
    def save(self, doc, *args, state=None, **kwargs):
        data = self._prep_save(doc, kwargs.pop('validate', True))
        if not args and state is not None and state.original_document:
            args = tuple(set((k for k, v in
                              doc_to_set(state.original_document)
                              ^ doc_to_set(data))))
        if args:
            values = dict((arg, data[arg]) for arg in args)
            result = self._impl(doc).update(
                dict(_id=doc._id), {'$set': values}, **fix_write_concern(kwargs))
        else:
            result = self._impl(doc).save(data, **fix_write_concern(kwargs))
        if result and '_id' not in doc:
            doc._id = result
        return result

However, I have found another bug, a very subtle one. Here's a simple test case. If you do multiple updates within a session and one of the updates puts a field back to the original value, that is not detected as a change so it is not saved. Not sure what the right solution to this would be. Maybe original_document could be updated after a flush/save? But I don't know what other impacts that would make.

class TestBasicMapping(TestCase):
    ...
    def test_multiple_update_flushes(self):
        initial_doc = self.Basic()
        initial_doc.a = 1
        self.session.flush()
        self.session.close()

        doc_updating = self.Basic.query.get(_id=initial_doc._id)
        doc_updating.a = 2
        self.session.flush()
        doc_updating.a = 1  # back to "initial" value
        doc_updating.e = 'foo'  # change something else too
        self.session.flush()
        self.session.close()

        doc_after_updates = self.Basic.query.get(_id=doc_updating._id)
        assert doc_after_updates.a == 1```

@CastixGitHub
Copy link
Contributor Author

CastixGitHub commented May 4, 2022

Ming seems to use original_document only inside update_if_not_modified
and as long as not modified "since first queried" means "since the last time it was queried" it should be ok to update it... (letting the users know it changed)

Isn't the state meant to recover data from a bad update "transaction"?
I see after an update the state status is set to clean

clean state (which means that they have not been modified since the last flush to the session)

then I consider not updating original_document a bug that you spotted

You're absolutely right about keeping args, so one could eventually bypass field detection or just skip some of them

I'm unable to reproduce the error you reported, is it somehow apple related?

Thank you

@brondsem
Copy link
Collaborator

Yea as far as I can tell it should be ok to update original_document. I'm kind of surprised that field isn't used in more places. So hopefully no subtle bugs will appear if we update it :)

Regarding the error I got, I'm not sure why it happened really. But if we go with the def save code that I posted, it avoids the error (when args is empty for whatever reason, the code will save the whole object instead of doing $set with nothing)

@CastixGitHub
Copy link
Contributor Author

Thank you Dave 😊

@brondsem
Copy link
Collaborator

Hi again. When I ran the Allura tests with these ming changes, I got some really deep exceptions related to deepcopy. I wonder if copy would be good enough? Or no copying at all? So I tried it and the ming tests still pass. However Allura tests got some other errors when I tried just copy or no copying, I'll have to investigate that more next week. What do you think?

Here's the deepcopy errors

  File "/src/allura/Allura/allura/model/monq_model.py", line 240, in __call__
    session(self).flush(self)
  File "/src/ming/ming/odm/base.py", line 35, in inner
    result = func(obj, *args, **kwargs)
  File "/src/ming/ming/odm/odmsession.py", line 105, in flush
    self.update_now(obj, st)
  File "/src/ming/ming/odm/base.py", line 35, in inner
    result = func(obj, *args, **kwargs)
  File "/src/ming/ming/odm/odmsession.py", line 115, in update_now
    mapper(obj).update(obj, st, self, **kwargs)
  File "/src/ming/ming/odm/base.py", line 35, in inner
    result = func(obj, *args, **kwargs)
  File "/src/ming/ming/odm/mapper.py", line 92, in update
    state.original_document = deepcopy(doc)
  File "/usr/local/lib/python3.7/copy.py", line 180, in deepcopy
    y = _reconstruct(x, memo, *rv)
  File "/usr/local/lib/python3.7/copy.py", line 307, in _reconstruct
    value = deepcopy(value, memo)
  File "/usr/local/lib/python3.7/copy.py", line 180, in deepcopy
    y = _reconstruct(x, memo, *rv)
  File "/usr/local/lib/python3.7/copy.py", line 307, in _reconstruct
    value = deepcopy(value, memo)
  File "/usr/local/lib/python3.7/copy.py", line 180, in deepcopy
    y = _reconstruct(x, memo, *rv)
  File "/usr/local/lib/python3.7/copy.py", line 281, in _reconstruct
    state = deepcopy(state, memo)
  File "/usr/local/lib/python3.7/copy.py", line 150, in deepcopy
    y = copier(x, memo)
  File "/usr/local/lib/python3.7/copy.py", line 221, in _deepcopy_tuple
    y = [deepcopy(a, memo) for a in x]
  File "/usr/local/lib/python3.7/copy.py", line 221, in <listcomp>
    y = [deepcopy(a, memo) for a in x]
  File "/usr/local/lib/python3.7/copy.py", line 150, in deepcopy
    y = copier(x, memo)
  File "/usr/local/lib/python3.7/copy.py", line 241, in _deepcopy_dict
    y[deepcopy(key, memo)] = deepcopy(value, memo)
  File "/usr/local/lib/python3.7/copy.py", line 180, in deepcopy
    y = _reconstruct(x, memo, *rv)
  File "/usr/local/lib/python3.7/copy.py", line 281, in _reconstruct
    state = deepcopy(state, memo)
  File "/usr/local/lib/python3.7/copy.py", line 150, in deepcopy
    y = copier(x, memo)
  File "/usr/local/lib/python3.7/copy.py", line 221, in _deepcopy_tuple
    y = [deepcopy(a, memo) for a in x]
  File "/usr/local/lib/python3.7/copy.py", line 221, in <listcomp>
    y = [deepcopy(a, memo) for a in x]
  File "/usr/local/lib/python3.7/copy.py", line 150, in deepcopy
    y = copier(x, memo)
  File "/usr/local/lib/python3.7/copy.py", line 241, in _deepcopy_dict
    y[deepcopy(key, memo)] = deepcopy(value, memo)
  File "/usr/local/lib/python3.7/copy.py", line 180, in deepcopy
    y = _reconstruct(x, memo, *rv)
  File "/usr/local/lib/python3.7/copy.py", line 281, in _reconstruct
    state = deepcopy(state, memo)
  File "/usr/local/lib/python3.7/copy.py", line 150, in deepcopy
    y = copier(x, memo)
  File "/usr/local/lib/python3.7/copy.py", line 241, in _deepcopy_dict
    y[deepcopy(key, memo)] = deepcopy(value, memo)
  File "/usr/local/lib/python3.7/copy.py", line 180, in deepcopy
    y = _reconstruct(x, memo, *rv)
  File "/usr/local/lib/python3.7/copy.py", line 281, in _reconstruct
    state = deepcopy(state, memo)
  File "/usr/local/lib/python3.7/copy.py", line 150, in deepcopy
    y = copier(x, memo)
  File "/usr/local/lib/python3.7/copy.py", line 241, in _deepcopy_dict
    y[deepcopy(key, memo)] = deepcopy(value, memo)
  File "/usr/local/lib/python3.7/copy.py", line 180, in deepcopy
    y = _reconstruct(x, memo, *rv)
  File "/usr/local/lib/python3.7/copy.py", line 281, in _reconstruct
    state = deepcopy(state, memo)
  File "/usr/local/lib/python3.7/copy.py", line 150, in deepcopy
    y = copier(x, memo)
  File "/usr/local/lib/python3.7/copy.py", line 241, in _deepcopy_dict
    y[deepcopy(key, memo)] = deepcopy(value, memo)
  File "/usr/local/lib/python3.7/copy.py", line 161, in deepcopy
    y = copier(memo)
  File "/var/local/env-allura/lib/python3.7/site-packages/pymongo/collection.py", line 3444, in __call__
    self.__name)
TypeError: 'Collection' object is not callable. If you meant to call the '__deepcopy__' method on a 'Database' object it is failing because no such method exists.```

@CastixGitHub
Copy link
Contributor Author

What do you think?

that we need copy on write

@brondsem
Copy link
Collaborator

brondsem commented Jun 2, 2022

I will at least merge the replace_session changes, since that is simple enough.

For the rest (updating only changed fields) it certainly is annoying that we've gotten so close yet keep running into problems for certain situations. I haven't had much time to spend on this, but when I get a chance I do want to investigate the error I found where deepcopy tries to copy a Collection since that seems pretty weird. I also wonder if there's any other way to deal with original_document that sidesteps the issue. Like maybe set it to None during flush, and don't even try to maintain it after that?

brondsem pushed a commit that referenced this pull request Jun 2, 2022
brondsem added a commit that referenced this pull request Jun 2, 2022
@brondsem
Copy link
Collaborator

brondsem commented Jun 2, 2022

I have merged the replace_session helper. And all the new tests, since they are good to have no matter what happens with this pull request.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants