Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Using SelectFromDatabase with table IDs finds nils instead of ID numbers. #38

Open
Grammarella opened this issue Aug 25, 2016 · 5 comments

Comments

@Grammarella
Copy link

This is a great gem. I'm using 0.7.3 right now and the Blacklist strategy to anonymize some sensitive data. I'm looking forward to using the parallel table execution strategy, but for the sake of this bug report I am using the normal sequential execution.

I got very excited about the SelectFromDatabase method and endeavored to use it for a number of things in order to "scramble" table references. First, and this is minor, the example at http://www.rubydoc.info/github/sunitparekh/data-anonymization/DataAnon/Strategy/Field/SelectFromDatabase does not make it clear that I should reuse the connection_spec for each SelectFromDatabase call, though I did figure out that I should keep passing that in from the constructor details.

The actual issue is that I have tables such as features_trainings (in this example) where I want to scramble the feature referred to in the linking table. I want to take the feature_id foreign key and replace it such that it refers to any random feature in the features table. So for example, I have this:

        table 'features_trainings' do
          primary_key 'training_id', 'feature_id'
          batch_size 1000
          anonymize('feature_id').using FieldStrategy::SelectFromDatabase.new('features','id', my_connection_spec)
        end

(Though the composite key is used here, I observe the same issue with tables that have primary_key 'id' in use.)

What happens is that the values returned by anonymizing are all nils. I tracked this down to select_from_database.rb, line 17. The odd thing is that source.select(field_name) returns a value like this:

#<ActiveRecord::Relation [#<DataAnon::Utils::SourceDatabase::Features id: nil>, #<DataAnon::Utils::SourceDatabase::Features id: 
nil>, #<DataAnon::Utils::SourceDatabase::Features id: nil>, #<DataAnon::Utils::SourceDatabase::Features id: nil>, #<DataAnon::Ut
ils::SourceDatabase::Features id: nil>, #<DataAnon::Utils::SourceDatabase::Features id: nil>, #<DataAnon::Utils::SourceDatabase:
:Features id: nil>, #<DataAnon::Utils::SourceDatabase::Features id: nil>, #<DataAnon::Utils::SourceDatabase::Features id: nil>, 
#<DataAnon::Utils::SourceDatabase::Features id: nil>, ...]>

For some reason, all the IDs are nil. Interestingly,

source.select(field_name).first[field_name]

returns nil, but

source.select(field_name).first.id

returns the actual id number (1, 2, etc.). Part of me thinks that changing that call on that line slightly would fix it, but I have a sense that what is going on may be more subtle than that, since source produces all those nils.

Of course, one workaround is obvious, and I will move to doing my own separate query for feature ids and using SelectFromList. But it seemed to me that SelectFromDatabase could be a powerful tool, and I thought this was worth reporting. Thanks again for this gem and its documentation.

@sunitparekh
Copy link
Owner

sunitparekh commented Sep 5, 2016

hey, I released rail5 upgraded version 0.8.0.rc1 of data anonymisation and I fixed and tested SelectFromDatabase and it is working fine. refer example https://github.com/sunitparekh/test-anonymization/blob/master/dell_whitelist.rb

please try and provide me feedback

@Grammarella
Copy link
Author

Grammarella commented Sep 8, 2016

Thank you very much for looking into this. I appreciate it.

Some background information: I'm using data-anonymization in a rake task, and as I'm not ready to upgrade my entire project to Rails 5, I am testing this in a copied project that tries to approximate the original situation. I'm also using mysql2 as my database adapter.

Here's my test table:
table 'addresses' do
primary_key 'id'
batch_size 1000
anonymize('state').using FieldStrategy::SelectFromDatabase.new('addresses', 'state', db_config)
anonymize('country_id').using FieldStrategy::SelectFromDatabase.new('countries', 'id', db_config)
end

And this is my error:
Mysql2::Error: Unknown column 'addresses.' in 'order clause': SELECT addresses.* FROM addresses ORDER BY addresses.`` ASC LIMIT 1000

It's tempting to think it's a mysql2 error, but I don't think that's the case. I debugged in the activerecord gem, activerecored-5.0.0.1/lib/active_record/relation/batches.rb. In the in_batches method, the reorder call appears to be where this breaks. The problem is that batch order returns this: addresses.`` ASC

Looking at how batch_order is defined, it appears that the problem is that quoted_primary_key is empty, and in my debugging, this indeed seems to be the case. So it almost seems as if something about SelectFromDatabase is losing a concept of the primary key, even though I did set that manually in this example.

@olly
Copy link

olly commented Mar 6, 2018

I'm also experiencing this issue when using Rails v5.1.5. It only occurs when a batch_size is specified. The backtrace is see is:

Processing table payment_infos records in batch size of 1000
  DataAnon::Utils::SourceDatabase::Payment_infos Load (1.1ms)  SELECT  "payment_infos".* FROM "payment_infos" ORDER BY "payment_infos"."" ASC LIMIT $1  [["LIMIT", 1000]]

PG::SyntaxError: ERROR:  zero-length delimited identifier at or near """"
LINE 1: ...".* FROM "payment_infos" ORDER BY "payment_infos"."" ASC LIM...
                                                             ^
: SELECT  "payment_infos".* FROM "payment_infos" ORDER BY "payment_infos"."" ASC LIMIT $1 
 ["/usr/local/var/rbenv/versions/2.3.1/lib/ruby/gems/2.3.0/gems/activerecord-5.1.5/lib/active_record/connection_adapters/postgresql_adapter.rb:616:in `async_exec'", "/usr/local/var/rbenv/versions/2.3.1/lib/ruby/gems/2.3.0/gems/activerecord-5.1.5/lib/active_record/connection_adapters/postgresql_adapter.rb:616:in `block (2 levels) in exec_no_cache'", "/usr/local/var/rbenv/versions/2.3.1/lib/ruby/gems/2.3.0/gems/activesupport-5.1.5/lib/active_support/dependencies/interlock.rb:46:in `block in permit_concurrent_loads'", "/usr/local/var/rbenv/versions/2.3.1/lib/ruby/gems/2.3.0/gems/activesupport-5.1.5/lib/active_support/concurrency/share_lock.rb:185:in `yield_shares'", "/usr/local/var/rbenv/versions/2.3.1/lib/ruby/gems/2.3.0/gems/activesupport-5.1.5/lib/active_support/dependencies/interlock.rb:45:in `permit_concurrent_loads'", "/usr/local/var/rbenv/versions/2.3.1/lib/ruby/gems/2.3.0/gems/activerecord-5.1.5/lib/active_record/connection_adapters/postgresql_adapter.rb:615:in `block in exec_no_cache'", "/usr/local/var/rbenv/versions/2.3.1/lib/ruby/gems/2.3.0/gems/activerecord-5.1.5/lib/active_record/connection_adapters/abstract_adapter.rb:613:in `block (2 levels) in log'", "/usr/local/var/rbenv/versions/2.3.1/lib/ruby/2.3.0/monitor.rb:214:in `mon_synchronize'", "/usr/local/var/rbenv/versions/2.3.1/lib/ruby/gems/2.3.0/gems/activerecord-5.1.5/lib/active_record/connection_adapters/abstract_adapter.rb:612:in `block in log'", "/usr/local/var/rbenv/versions/2.3.1/lib/ruby/gems/2.3.0/gems/activesupport-5.1.5/lib/active_support/notifications/instrumenter.rb:21:in `instrument'", "/usr/local/var/rbenv/versions/2.3.1/lib/ruby/gems/2.3.0/gems/activerecord-5.1.5/lib/active_record/connection_adapters/abstract_adapter.rb:604:in `log'", "/usr/local/var/rbenv/versions/2.3.1/lib/ruby/gems/2.3.0/gems/activerecord-5.1.5/lib/active_record/connection_adapters/postgresql_adapter.rb:614:in `exec_no_cache'", "/usr/local/var/rbenv/versions/2.3.1/lib/ruby/gems/2.3.0/gems/activerecord-5.1.5/lib/active_record/connection_adapters/postgresql_adapter.rb:603:in `execute_and_clear'", "/usr/local/var/rbenv/versions/2.3.1/lib/ruby/gems/2.3.0/gems/activerecord-5.1.5/lib/active_record/connection_adapters/postgresql/database_statements.rb:79:in `exec_query'", "/usr/local/var/rbenv/versions/2.3.1/lib/ruby/gems/2.3.0/gems/activerecord-5.1.5/lib/active_record/connection_adapters/abstract/database_statements.rb:371:in `select'", "/usr/local/var/rbenv/versions/2.3.1/lib/ruby/gems/2.3.0/gems/activerecord-5.1.5/lib/active_record/connection_adapters/abstract/database_statements.rb:42:in `select_all'", "/usr/local/var/rbenv/versions/2.3.1/lib/ruby/gems/2.3.0/gems/activerecord-5.1.5/lib/active_record/connection_adapters/abstract/query_cache.rb:97:in `select_all'", "/usr/local/var/rbenv/versions/2.3.1/lib/ruby/gems/2.3.0/gems/activerecord-5.1.5/lib/active_record/querying.rb:39:in `find_by_sql'", "/usr/local/var/rbenv/versions/2.3.1/lib/ruby/gems/2.3.0/gems/activerecord-5.1.5/lib/active_record/relation.rb:678:in `exec_queries'", "/usr/local/var/rbenv/versions/2.3.1/lib/ruby/gems/2.3.0/gems/activerecord-5.1.5/lib/active_record/relation.rb:546:in `load'", "/usr/local/var/rbenv/versions/2.3.1/lib/ruby/gems/2.3.0/gems/activerecord-5.1.5/lib/active_record/relation.rb:255:in `records'", "/usr/local/var/rbenv/versions/2.3.1/lib/ruby/gems/2.3.0/gems/activerecord-5.1.5/lib/active_record/relation/batches.rb:216:in `block in in_batches'", "/usr/local/var/rbenv/versions/2.3.1/lib/ruby/gems/2.3.0/gems/activerecord-5.1.5/lib/active_record/relation/batches.rb:214:in `loop'", "/usr/local/var/rbenv/versions/2.3.1/lib/ruby/gems/2.3.0/gems/activerecord-5.1.5/lib/active_record/relation/batches.rb:214:in `in_batches'", "/usr/local/var/rbenv/versions/2.3.1/lib/ruby/gems/2.3.0/gems/activerecord-5.1.5/lib/active_record/relation/batches.rb:128:in `find_in_batches'", "/usr/local/var/rbenv/versions/2.3.1/lib/ruby/gems/2.3.0/gems/activerecord-5.1.5/lib/active_record/relation/batches.rb:62:in `find_each'", "/usr/local/var/rbenv/versions/2.3.1/lib/ruby/gems/2.3.0/bundler/gems/data-anonymization-161ddb329293/lib/strategy/base.rb:132:in `process_table_in_batches'", "/usr/local/var/rbenv/versions/2.3.1/lib/ruby/gems/2.3.0/bundler/gems/data-anonymization-161ddb329293/lib/strategy/base.rb:105:in `process'", "/usr/local/var/rbenv/versions/2.3.1/lib/ruby/gems/2.3.0/bundler/gems/data-anonymization-161ddb329293/lib/core/database.rb:68:in `block in anonymize'", "/usr/local/var/rbenv/versions/2.3.1/lib/ruby/gems/2.3.0/bundler/gems/data-anonymization-161ddb329293/lib/core/database.rb:66:in `each'", "/usr/local/var/rbenv/versions/2.3.1/lib/ruby/gems/2.3.0/bundler/gems/data-anonymization-161ddb329293/lib/core/database.rb:66:in `anonymize'", "/usr/local/var/rbenv/versions/2.3.1/lib/ruby/gems/2.3.0/bundler/gems/data-anonymization-161ddb329293/lib/core/database.rb:46:in `anonymize'", "/usr/local/var/rbenv/versions/2.3.1/lib/ruby/gems/2.3.0/bundler/gems/data-anonymization-161ddb329293/lib/core/dsl.rb:10:in `database'", "/usr/local/var/rbenv/versions/2.3.1/lib/ruby/gems/2.3.0/bundler/gems/switcher-common-0029bd85b8fc/lib/switcher/anonymize.rb:55:in `execute'", "/usr/local/var/rbenv/versions/2.3.1/lib/ruby/gems/2.3.0/bundler/gems/switcher-common-0029bd85b8fc/lib/switcher/anonymize.rb:45:in `run'", "/usr/local/var/rbenv/versions/2.3.1/lib/ruby/gems/2.3.0/bundler/gems/switcher-common-0029bd85b8fc/lib/switcher/anonymize.rb:11:in `run'", "/usr/local/var/rbenv/versions/2.3.1/lib/ruby/gems/2.3.0/bundler/gems/switcher-common-0029bd85b8fc/lib/switcher/tasks/anonymize.rake:9:in `block in <top (required)>'", "/usr/local/var/rbenv/versions/2.3.1/lib/ruby/gems/2.3.0/gems/rake-12.3.0/lib/rake/task.rb:251:in `block in execute'", "/usr/local/var/rbenv/versions/2.3.1/lib/ruby/gems/2.3.0/gems/rake-12.3.0/lib/rake/task.rb:251:in `each'", "/usr/local/var/rbenv/versions/2.3.1/lib/ruby/gems/2.3.0/gems/rake-12.3.0/lib/rake/task.rb:251:in `execute'", "/usr/local/var/rbenv/versions/2.3.1/lib/ruby/gems/2.3.0/gems/bugsnag-5.3.3/lib/bugsnag/rake.rb:12:in `execute_with_bugsnag'", "/usr/local/var/rbenv/versions/2.3.1/lib/ruby/gems/2.3.0/gems/rake-12.3.0/lib/rake/task.rb:195:in `block in invoke_with_call_chain'", "/usr/local/var/rbenv/versions/2.3.1/lib/ruby/2.3.0/monitor.rb:214:in `mon_synchronize'", "/usr/local/var/rbenv/versions/2.3.1/lib/ruby/gems/2.3.0/gems/rake-12.3.0/lib/rake/task.rb:188:in `invoke_with_call_chain'", "/usr/local/var/rbenv/versions/2.3.1/lib/ruby/gems/2.3.0/gems/rake-12.3.0/lib/rake/task.rb:181:in `invoke'", "/usr/local/var/rbenv/versions/2.3.1/lib/ruby/gems/2.3.0/gems/rake-12.3.0/lib/rake/application.rb:160:in `invoke_task'", "/usr/local/var/rbenv/versions/2.3.1/lib/ruby/gems/2.3.0/gems/rake-12.3.0/lib/rake/application.rb:116:in `block (2 levels) in top_level'", "/usr/local/var/rbenv/versions/2.3.1/lib/ruby/gems/2.3.0/gems/rake-12.3.0/lib/rake/application.rb:116:in `each'", "/usr/local/var/rbenv/versions/2.3.1/lib/ruby/gems/2.3.0/gems/rake-12.3.0/lib/rake/application.rb:116:in `block in top_level'", "/usr/local/var/rbenv/versions/2.3.1/lib/ruby/gems/2.3.0/gems/rake-12.3.0/lib/rake/application.rb:125:in `run_with_threads'", "/usr/local/var/rbenv/versions/2.3.1/lib/ruby/gems/2.3.0/gems/rake-12.3.0/lib/rake/application.rb:110:in `top_level'", "/usr/local/var/rbenv/versions/2.3.1/lib/ruby/gems/2.3.0/gems/rake-12.3.0/lib/rake/application.rb:83:in `block in run'", "/usr/local/var/rbenv/versions/2.3.1/lib/ruby/gems/2.3.0/gems/rake-12.3.0/lib/rake/application.rb:186:in `standard_exception_handling'", "/usr/local/var/rbenv/versions/2.3.1/lib/ruby/gems/2.3.0/gems/rake-12.3.0/lib/rake/application.rb:80:in `run'", "/Users/Olly/Work/Switcher/broadband-leads/bin/rake:6:in `<top (required)>'", "/usr/local/var/rbenv/versions/2.3.1/lib/ruby/gems/2.3.0/gems/activesupport-5.1.5/lib/active_support/dependencies.rb:286:in `load'", "/usr/local/var/rbenv/versions/2.3.1/lib/ruby/gems/2.3.0/gems/activesupport-5.1.5/lib/active_support/dependencies.rb:286:in `block in load'", "/usr/local/var/rbenv/versions/2.3.1/lib/ruby/gems/2.3.0/gems/activesupport-5.1.5/lib/active_support/dependencies.rb:258:in `load_dependency'", "/usr/local/var/rbenv/versions/2.3.1/lib/ruby/gems/2.3.0/gems/activesupport-5.1.5/lib/active_support/dependencies.rb:286:in `load'", "/usr/local/var/rbenv/versions/2.3.1/lib/ruby/2.3.0/rubygems/core_ext/kernel_require.rb:55:in `require'", "/usr/local/var/rbenv/versions/2.3.1/lib/ruby/2.3.0/rubygems/core_ext/kernel_require.rb:55:in `require'", "-e:1:in `<main>'"]

@olly
Copy link

olly commented Mar 6, 2018

I managed to fix this by altering SelectFromDatabase. I believe the root cause is that SelectFromDatabase tries to collect it's values before the table is fully configured, and there's no primary key. When rails tries to reorder as part of the in_batches method, it tries to use a nil primary key.

See: https://gist.github.com/olly/6388e7db348d1023e340109ea9ce0362

@sunitparekh
Copy link
Owner

@olly thanks for sending patch to fix issues. I merged your patch and released gem version to v0.8.2
thanks a lot once again for your contribution.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants