Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

NickAkhmetov/CAT-888 Improve combined provenance logic to safeguard against duplicate nodes/edges #3535

Merged
merged 5 commits into from
Sep 10, 2024

Conversation

NickAkhmetov
Copy link
Collaborator

Summary

This PR reinforces the logic used to combine provenance graphs to guard against duplicate edges more effectively. Since edge IDs are not globally unique, collisions in edges don't necessarily mean they're referring to the same object - however, this led to cases where two edges with unique ID's led to duplicate connections between nodes, which caused the cwlInput conversion to throw errors.

With these changes, duplicate edges should no longer propagate to the graph from the combining logic. I've also improved the error thrown by the makeCwlInput function so that it is easier to track down exactly what's going wrong.

Design Documentation/Original Tickets

https://hms-dbmi.atlassian.net/browse/CAT-888

Testing

Manually tested with the dataset that this issue was occuring on:

Screenshots/Video

image

Checklist

  • Code follows the project's coding standards
    • Lint checks pass locally
    • New CHANGELOG-your-feature-name-here.md is present in the root directory, describing the change(s) in full sentences.
  • Unit tests covering the new feature have been added
  • All existing tests pass
  • Any relevant documentation in JIRA/Confluence has been updated to reflect the new feature
  • Any new functionalities have appropriate analytics functionalities added

john-conroy
john-conroy previously approved these changes Sep 9, 2024
Copy link
Collaborator

@john-conroy john-conroy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't want to block the fix, but I'm curious why this case necessitates type assertion?

Comment on lines 13 to 28
Object.keys(obj1).forEach((key) => {
// @ts-expect-error -- We're specifically checking for the key in obj2
if (!(key in obj2) || obj1[key] !== obj2[key]) {
diffKeys.push(key);
}
});
Object.keys(obj2).forEach((key) => {
// @ts-expect-error -- We're specifically checking for the key in obj1
if (!(key in obj1) || obj1[key] !== obj2[key]) {
if (!diffKeys.includes(key)) {
diffKeys.push(key);
}
}
});
return diffKeys;
}
Copy link
Collaborator

@john-conroy john-conroy Sep 9, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just brainstorming a potentially simpler method. Feel free to take or leave.

const diffKeys = Object.keys({...obj1, ...obj2}).reduce((acc, k) => {
 const v1 = obj1?.[k];
 const v2 = obj2?.[k];

if(v1 === v2){
   return acc;
}

return [...acc, k];
}, [])

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, good call. I was definitely overthinking this.

@@ -8,6 +8,27 @@ export function createProvDataURL(uuid: string, entityEndpoint: string) {
return `${entityEndpoint}/entities/${uuid}/provenance`;
}

function findDifference(obj1: object, obj2: object) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why object as the types here rather than something like Record<string, unknown>? Is it because the later used merged[key] is unknown and we can only use a type guard for object?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've generified this to match nonDestructiveMerge, thanks for pointing out the improvement!

@@ -21,11 +42,35 @@ export function nonDestructiveMerge<T extends Record<string, unknown>, U extends
): T {
const merged = { ...a } as Record<string, unknown>;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we provide stricter types for the provenance data?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The types are currently provided via the generics, i.e. this function gets called with ProvData objects, then gets cast back to the ProvData type at the end with the return merged as T. Prior to adding this most recent prov-only logic, this was a more generic method we could reuse elsewhere; if we end up needing a similar functionality in the future, we could generify this approach by extracting the prov-specific logic into a function param.

@@ -8,6 +8,18 @@ export function createProvDataURL(uuid: string, entityEndpoint: string) {
return `${entityEndpoint}/entities/${uuid}/provenance`;
}

function findDifference<T extends Record<string, unknown>, U extends Record<string, unknown>>(obj1: T, obj2: U) {
const diffKeys = Object.keys({ ...obj1, ...obj2 }).reduce((acc, key) => {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
const diffKeys = Object.keys({ ...obj1, ...obj2 }).reduce((acc, key) => {
const diffKeys = Object.keys({ ...obj1, ...obj2 }).reduce<string[]>((acc, key) => {

Would this avoid the assertion below?

Copy link
Collaborator

@john-conroy john-conroy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me, thanks!

@NickAkhmetov NickAkhmetov merged commit 31fdf77 into main Sep 10, 2024
8 checks passed
@NickAkhmetov NickAkhmetov deleted the nickakhmetov/cat-888 branch September 10, 2024 15:08
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants