Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use admin IDs to resolve entities during import, add validation for conflicts in entity maps #777

Merged
merged 19 commits into from
Sep 27, 2021
Merged
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
60 changes: 38 additions & 22 deletions sfm_pc/management/commands/import_google_doc.py
Original file line number Diff line number Diff line change
Expand Up @@ -51,17 +51,31 @@

class EntityMap(dict):
'''
Container for mapping of UUID to array of (name, column, row, sheet) tuples.
Container for mapping of UUID to value and locations:

{
'uuid': {
'value': [
(column, row, sheet),
...
]
}
}
'''

KEY_TYPE = 'UUID'
VALUE_TYPE = 'name'

def add(self, key, value, column, row, sheet):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems like this might be a bit easier to reason about if you implemented this as a nested dict

__init__(self):
    _d = {}
...
if key not in self:
    self._d[key] = {}

location = (column, row, sheet)
if value not in self[key]:
    self_d[key][value] = [location]
else:
    self._d[key][value].append(location)

return self

if you need the data back in a particular way you are getting it now you could
implement this dunder method

def __getitem__(self, key):
    return [(k,) + values for k, values in self._d[key].items()]

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, I'm finding this suggestion much harder to parse. I originally implemented the map as an object with a dictionary attribute here, e182853, but I found that most times I accessed self, I was accessing self.map (the dictionary), so it felt more direct for EntityMap to be a dictionary itself. Is this a change you require?

Copy link
Collaborator

@fgregg fgregg Sep 27, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i think you effectively do have a nested structure since the logic, as, I understand it is to have

key
|_ value
    |_ location

which you sometimes want to transpose to another nested structure:

value
|_ key
    |_ location

i think it's going to be easier to work with this in the future if your data structure reflects that nesting.

so, i do want to see an implementation that has that nesting.

I do not require that you implement this with a dictionary attribute. that's only necessary if you need to flattened results back as in the example __getitem__ if you don't need that flattening, you could do.

if key not in self:
    self[key] = {}

location = (column, row, sheet)
if value not in self[key]:
    self[key][value] = [location]
else:
    self[key][value].append(location)

return self

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for clarifying! I missed the substance of your first comment. 😅 Refactored in ad9d10b.

if key not in self:
self[key] = []
self[key] = {}

if value not in self[key]:
self[key][value] = []

location = (column, row, sheet)

value_tuple = (value, column, row, sheet)
self[key].append(value_tuple)
self[key][value].append(location)

return self

Expand All @@ -76,9 +90,9 @@ def get_transposed(self):
transposed = EntityMap()

for key, values in self.items():
for value_tuple in values:
value, *vargs = value_tuple
transposed = transposed.add(value, key, *vargs)
for value, locations in values.items():
for location in locations:
transposed = transposed.add(value, key, *location)

return transposed

Expand All @@ -90,7 +104,7 @@ def get_conflicts(self, transpose=False):
entity_map = self if not transpose else self.get_transposed()

for key, values in entity_map.items():
if len(set(val for val, *_ in values)) > 1:
if len(set(val for val in values.keys())) > 1:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this can be if len(values) > 1 since values (as keys to a dictionary) are guaranteed to be unique

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice catch, thank you and donezo.

yield key, values

def get_key_value_types(self, transpose=False):
Expand Down Expand Up @@ -273,21 +287,23 @@ def log_conflicts(self, entity_type, entity_map, transpose=False):
'"{column}".'
)

for key, values in entity_map.get_conflicts(transpose=transpose):
for value in sorted(values, key=lambda value: value[2]):
value, column, row, sheet = value
key_type, value_type = entity_map.get_key_value_types(transpose=transpose)

msg = error_format.format(
entity_type=entity_type,
key_type=key_type,
key_value=key,
value_type=value_type,
value=value,
column=column
)
key_type, value_type = entity_map.get_key_value_types(transpose=transpose)

self.log_error(msg, sheet=sheet, current_row=row)
for key, values in entity_map.get_conflicts(transpose=transpose):
for value, locations in values.items():
for column, row, sheet in sorted(locations,
key=lambda location: location[1]):

msg = error_format.format(
entity_type=entity_type,
key_type=key_type,
key_value=key,
value_type=value_type,
value=value,
column=column
)

self.log_error(msg, sheet=sheet, current_row=row)

def create_locations(self):
this_dir = os.path.abspath(os.path.dirname(__file__))
Expand Down