Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add ActiveModel::Datastore.save_all feature #27

Open
wants to merge 7 commits into
base: main
Choose a base branch
from

Conversation

shao1555
Copy link
Contributor

Motivation

I want to use bulk insert feature to improve performance.

Resolution

implement ActiveModel::Datastore.save_all(entries)
in case of 500 records insertion, 200x faster than before.

Remark

I'm not good for English. please feel free to update comment and document :)

@shao1555
Copy link
Contributor Author

oh.. I'll fix code style reported by RuboCop.

@shao1555
Copy link
Contributor Author

@bmclean please review it.

@bmclean bmclean self-requested a review April 18, 2018 15:16
@bmclean bmclean self-assigned this Apr 18, 2018
@bmclean
Copy link
Member

bmclean commented Apr 20, 2018

Hi @shao1555. I like the concept. Hopefully I will have time to review it in more detail tomorrow.

@@ -161,6 +161,18 @@ class User
end
```

To improve performance, you can save many entities at once. (Batch operation)
Copy link
Member

@bmclean bmclean Apr 20, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

At this position in the README we have not yet mentioned the concept of parent/ancestor. Let's move this new section down after Datastore Consistency.

Copy link
Member

@bmclean bmclean left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is some really nice work @shao1555!

```ruby
group = Group.find(params[:group_id])
users = %w{ alice bob charlie }.map{ |name| User.new(name: name) }
if ActiveModel::Datastore.save_all(users, parent: group)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe passing in the group model object to parent will fail. It needs to be a Google::Cloud::Datastore::Key.

end
end

test 'batch save' do
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should also add a test where the parent is explicitly passed in.

@bmclean bmclean assigned shao1555 and unassigned bmclean Apr 21, 2018
sliced_entries
end.flatten
end
end
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What would we expect this method to return? It is currently returning an array of model objects. Should it return an array of Google::Cloud::Datastore::Entity instead?

Also, is the results.present? line needed?

@bmclean
Copy link
Member

bmclean commented Apr 27, 2018

@shao1555 This is ready for your feedback.

@NielsKSchjoedt
Copy link

NielsKSchjoedt commented Mar 5, 2021

Hi @bmclean any particular reason why was this not merged yet? I am looking for the same functionality :-) I'll do a fork and try to clean it up.

@bmclean
Copy link
Member

bmclean commented Mar 5, 2021

Hi @NielsKSchjoedt. This was a proof of concept that was never completed. We left it here in case somebody wanted to pick it up again.

@bmclean
Copy link
Member

bmclean commented Mar 5, 2021

Also, one thing we have learned since this feature was started is that a batch size of 500 won't always work.

See entries.each_slice(500).map

While 500 is the max number of entities that can be passed to Datastore sometimes we hit the size limit first. We should set the default to 250 and allow a batch size to be passed in (up to 500).

@NielsKSchjoedt
Copy link

NielsKSchjoedt commented Mar 5, 2021

@bmclean great. If I am to finish it I will add both a save_all and a save_all! method. Where the latter will throw an exception if any objects are invalid. Agree? As for the return value of the methods I can think of several ideas. From top of my head I am wondering if something like: {User#1 => true, User#2 => false} - a hash where the keys are the ActiveModel objects with a value of true or false depending on whether they were successfully persisted or not. What's your thoughts?

@bmclean
Copy link
Member

bmclean commented Mar 5, 2021

@NielsKSchjoedt Sure, save_all and a save_all! methods would be ok. Yes, perhaps the return values should be ActiveModel. That way validation errors can be extracted if needed. I like your hash idea with the booleans, that would work. For users that wanted a successful count returned they can run something like .select { |k, v| v == true }.size on the result.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants