Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

NickAkhmetov/CAT-888 Improve combined provenance logic to safeguard against duplicate nodes/edges #3535

Merged
merged 5 commits into from
Sep 10, 2024
Merged
Show file tree
Hide file tree
Changes from 4 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions CHANGELOG-cat-888.md
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
- Improve combined provenance logic to safeguard against duplicate nodes/edges.
Original file line number Diff line number Diff line change
Expand Up @@ -21,7 +21,7 @@ export function makeCwlInput(name: string, steps: unknown[], extras: unknown, is
];
if (steps) {
if (steps.length > 1) {
throw new Error('Limited to 1 step');
throw new Error(`Limited to 1 input source step, ${name} got: ${JSON.stringify(steps)}`);
} else if (steps.length === 1) {
[source[0].step] = steps as unknown as string;
}
Expand Down
44 changes: 43 additions & 1 deletion context/app/static/js/components/detailPage/provenance/utils.ts
Original file line number Diff line number Diff line change
Expand Up @@ -8,6 +8,18 @@ export function createProvDataURL(uuid: string, entityEndpoint: string) {
return `${entityEndpoint}/entities/${uuid}/provenance`;
}

function findDifference<T extends Record<string, unknown>, U extends Record<string, unknown>>(obj1: T, obj2: U) {
const diffKeys = Object.keys({ ...obj1, ...obj2 }).reduce((acc, key) => {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
const diffKeys = Object.keys({ ...obj1, ...obj2 }).reduce((acc, key) => {
const diffKeys = Object.keys({ ...obj1, ...obj2 }).reduce<string[]>((acc, key) => {

Would this avoid the assertion below?

if (key in obj1 && key in obj2 && obj1[key] === obj2[key]) {
return acc;
}
return [...acc, key];
}, [] as string[]);
return diffKeys;
}

const ignoredFields = ['hubmap:last_modified_timestamp', 'hubmap:published_timestamp', 'hubmap:created_timestamp'];

/**
* Combines ProvData objects without overwriting properties or inserting unnecessary duplicates.
* @param a the first ProvData object
Expand All @@ -21,11 +33,41 @@ export function nonDestructiveMerge<T extends Record<string, unknown>, U extends
): T {
const merged = { ...a } as Record<string, unknown>;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we provide stricter types for the provenance data?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The types are currently provided via the generics, i.e. this function gets called with ProvData objects, then gets cast back to the ProvData type at the end with the return merged as T. Prior to adding this most recent prov-only logic, this was a more generic method we could reuse elsewhere; if we end up needing a similar functionality in the future, we could generify this approach by extracting the prov-specific logic into a function param.

Object.entries(b).forEach(([key, value]) => {
// Check for duplicate provenance edges by comparing entity and activity IDs
if (value && typeof value === 'object' && 'prov:entity' in value && 'prov:activity' in value) {
if (
Object.values(merged).some(
(v) =>
typeof v === 'object' &&
v &&
'prov:entity' in v &&
'prov:activity' in v &&
v['prov:entity'] === value['prov:entity'] &&
v['prov:activity'] === value['prov:activity'],
)
) {
return;
}
}

if (merged[key]) {
// Confirm that properties are actually different
// Heuristic to check if the objects are the same
if (JSON.stringify(merged[key]) === JSON.stringify(value)) {
return;
}

if (merged[key] && typeof merged[key] === 'object' && value && typeof value === 'object') {
// Perform a more in-depth check to see if objects are the same when excluding certain fields
const difference = findDifference(
merged[key] as Record<string, unknown>,
value as Record<string, unknown>,
).filter((diffField) => !ignoredFields.includes(diffField));

if (difference.length === 0) {
return;
}
}

merged[`${key}-${keyMod}`] = value;
} else {
merged[key] = value;
Expand Down
Loading