Skip to content

fix(#9601): prototype duplicate prevention#9609

Merged
jkuester merged 7 commits intomedic:masterfrom
ChinHairSaintClair:duplicate-prevention
Apr 18, 2025
Merged

fix(#9601): prototype duplicate prevention#9609
jkuester merged 7 commits intomedic:masterfrom
ChinHairSaintClair:duplicate-prevention

Conversation

@ChinHairSaintClair
Copy link
Copy Markdown
Contributor

@ChinHairSaintClair ChinHairSaintClair commented Nov 4, 2024

Description

This feature prevents duplicate hierarchy contact siblings from being created, as discussed with @jkuester and @mrjones-plip in a "technical working session" and through interactions on the linked issue thread.

To achieve this, we hook into duplicate detection strategies through "configuration", updated the saveContact method in the form.service.ts file to include an additional check, and display potential duplicates in a duplicate_info section added to the enketo.component.html file. Each duplicate is displayed as a card with expandible/collapsible segments, accompanied by an "acknowledgement" prompt to allow form submission.

Configuration:

{
   "expression":"levenshteinEq(3, current.name, existing.name)"
}

Currently two strategies are supported: levenshteinEq and normalizedLevenshteinEq, with the ability to customize properties based on implementation needs.

Example implementation:

{
    "title": [
      {
        "locale": "en",
        "content": "New Household"
      },
    ],
    "icon": "household-1",
    "context": {
      "expression": "contact.type === 'Indawo'",
      "permission": "can_register_household",
      "duplicate_check": {
          "expression":"levenshteinEq(4, current.name, existing.name)"
      }
    }
  }

Here, the duplicate_check.expression defines the logic for comparing the current record with its sibling. If no duplicate_check is provided, the system defaults to evaluating the name field. E.g:

  • For a "location", you might match on street_number, street_name, and postal_code:
"expression": "current.street_number === existing.street_number && levenshteinEq(3, current.street_name, existing.street_name) && current.postal_code == existing.postal_code"
  • For a "person", you might match on name, sex, and date_of_birth:
"expression": "levenshteinEq(3, current.name, existing.name) && current.sex === existing.sex && current.postal_code === existing.postal_code"`

In these expressions, current refers to the created/edited form, while existing refers to a "sibling" loaded from the database.

Conditional Duplicate Check:

This is a key requirement. For now, the is_canonical form question will be used to control conditional duplicate checking. When the backend flags a record as a duplicate, the CHW can mark the record accordingly. Downstream it will allow us to merge or delete the record based on the specified action.

Opt-out:

Use the following configuration to disable the duplicate-checking functionality for a specific form:

{
   "duplicate_check":{
      "disabled":true
   }
}

Misc:

We use the CHT provided medic-client/contacts_by_parent view to query for siblings.

@kennsippell, since we've touched on the duplicate topic before, it would be great to get your thoughts as this as well.

#Issue
Closes #9601

Code review checklist

  • Readable: Concise, well named, follows the style guide, documented if necessary.
  • Documented: Configuration and user documentation on cht-docs
  • Tested: Unit and/or e2e where appropriate
  • Internationalised: All user facing text
  • Backwards compatible: Works with existing data and configuration or includes a migration. Any breaking changes documented in the release notes.

License

The software is provided under AGPL-3.0. Contributions to this project are accepted under the same license.

Comment thread webapp/src/ts/main.ts Outdated
queryParams: {
valuePaths: ['/data/health_center/is_user_flagged_duplicate', '/data/health_center/duplicate/action'],
// eslint-disable-next-line eqeqeq
query: (duplicate, action) => duplicate === 'yes' && action != null
Copy link
Copy Markdown
Contributor

@fardarter fardarter Nov 13, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jkuester @ChinHairSaintClair My feeling here is that this would need to end up with some kind of operator syntax support that lets the user do logic, maybe something like (though doesn't have to be) JsonLogic: https://jsonlogic.com/

The way I see it there are two point of config -- the excel files and the configuration json -- and probably neither should take serialised JS (?).

If we return the key => { key, value } as a tuple or object on key lookup, we could let the user express logic maybe like this (with the value found at the key tested against the value provided (with) using the operator):

let andOnly = {
  logic: [
    [
      // array bracket denotes a logical grouping
      {
        key1: { with: value, op: $in },
        key2: { with: value, op: $contains },
        key3: { with: value, op: $eq },
      },
      {
        key1: { with: value, op: $in },
        key2: { with: value, op: $ne },
        key3: { with: value, op: $startsWith },
      },
    ],
  ],
};

let orWithNestedAnds = {
  logic: [
    [
      {
        key1: { with: value, op: $in },
      },
      {
        key2: { with: value, op: $eq },
      },
    ],
    // each grouping in its own bracket
    [
      {
        key1: { with: value, op: $in },
      },
      {
        key2: { with: value, op: $in },
      },
    ],
  ],
};

let orWithNestedOrWithNestedAnds = {
  logic: [
    [
      {
        key1: { with: value, op: $in },
      },
      {
        key2: { with: value, op: $in },
      },
    ],
    // OR
    [
      [
        // nested groupings acceptable
        {
          key1: { with: value, op: $ne },
        },
        // AND
        {
          key2: { with: value, op: $ne },
        },
      ],
      // OR
      [
        {
          key1: { with: value, op: $contains },
        },
        // AND
        {
          key1: { with: value, op: $startsWith },
        },
      ],
    ],
  ],
};

The other stuff seems easier to move to config.

Comment thread webapp/src/ts/polyfills.ts Outdated
EnketoForm:any;
_phdcChanges: { // Additional namespace
// Specify your own contact_types here
hierarchyDuplicatePrevention: Partial<{[key in 'person' | 'health_center']: Strategy;}>;
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jkuester @ChinHairSaintClair Following on from my above comment:

The way I see it there are two point of config -- the excel files and the configuration json.

We probably couldn't expect the user to do config here. I agree the type safety is lovely and I'm not sure whether it's possible to generate a type from the config, but if not I don't see how we can keep the type safety and still maintain the existing configuration contract.

};
}

private parseXmlForm(form): Document | undefined {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd move as much of this sort of stuff as is possible to a lib/file so it can be unit tested easily and then just called internally. Not sure if private is needed here.

(I know this was just a conceptual prototype not productionised, but for forward looking feedback.)

}
}

return count > 0 ? totalScore / count : null;
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does the alternative return type need to be null or is there an appropriate default value of type int?

}

// Promise.allSettled is not available due to the app's javascript version
private allSettledFallback(promises: Promise<Exclude<any, null | undefined>>[]): Promise<ReturnType[]> {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jkuester @ChinHairSaintClair Would promise allSettled not be transpiled to a target version?

const $duplicateInfoElement = $('#contact-form').find('#duplicate_info');
$duplicateInfoElement.empty(); // Remove all child nodes
$duplicateInfoElement.show();
// TODO: create a template component where these values are fed into.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this need DOM mutation? Can the angular template not handle it?

count++;
}
$duplicateInfoElement.append(content);
$duplicateInfoElement.on('click', '.duplicate-navigate-link', () => {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No anon functions on event listeners please. They don't get properly cleaned up by the browser. They need to be named.

Comment thread webapp/src/ts/polyfills.ts Outdated

export const NormalizedLevenshtein: Strategy = {
type: 'NormalizedLevenshtein',
threshold: 0.334,
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You'll probably want to set this in config

formService.saveContact.resolves({ docId: 'new_clinic_id' });

// TODO: figure out why this test's dbLookupRef is null despite being set in the beforeEach
component.dbLookupRef = Promise.resolve({
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You may want sinon.resolves

Comment thread webapp/src/ts/main.ts Outdated
{form_prop_path: `/data/health_center/name`, db_doc_ref: 'name'},
{form_prop_path: '/data/health_center/external_id', db_doc_ref: 'external_id'}
],
queryParams: {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe formQuestion? Name is still bothering me.

Comment thread webapp/src/ts/main.ts Outdated
health_center: {
...Levenshtein,
props: [
{form_prop_path: `/data/health_center/name`, db_doc_ref: 'name'},
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

camelCase is the js standard.

Comment thread webapp/src/ts/main.ts Outdated
],
queryParams: {
valuePaths: ['/data/health_center/is_user_flagged_duplicate', '/data/health_center/duplicate/action'],
// eslint-disable-next-line eqeqeq
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jkuester Continuing to lobby for != as the most excellent exception to the strict equality lint rule. Checks for both null and undefined.

@ChinHairSaintClair ChinHairSaintClair force-pushed the duplicate-prevention branch 2 times, most recently from 5442b21 to 9eb471f Compare January 15, 2025 15:48
Copy link
Copy Markdown
Contributor

@jkuester jkuester left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wow, this is amazing! I think we are 100% on the right track here and I am excited to get this functionality released and in the hands of users.

I have not had a chance to look at all the changes yet, but I wanted to go ahead and share my comments so far (mostly focused on the internal service stuff and have not looked much yet at the UI code). Mostly just a bunch of minor questions/suggestions.

I provided some suggestions regarding the Sonar complaints for hasOwnProperty, but I have not had a chance to dig into the circular navigation issue yet.

Thank you for the great PR and the patience to work with us and get the across the finish line!

(FYI, I am following conventional comments for my PR comments, so that is where the comment prefixes are coming from...)

Comment thread webapp/src/ts/services/xml-forms-context-utils.service.ts Outdated
Comment thread webapp/src/ts/services/xml-forms-context-utils.service.ts Outdated
Comment thread webapp/src/ts/services/xml-forms-context-utils.service.ts Outdated
Comment thread webapp/src/ts/services/utils/deduplicate.ts Outdated
Comment thread webapp/src/ts/services/utils/deduplicate.ts Outdated
Comment thread webapp/src/ts/services/form.service.ts Outdated
Comment thread webapp/src/ts/services/form.service.ts Outdated
Comment thread webapp/src/ts/services/form.service.ts Outdated
Comment thread webapp/src/ts/services/form.service.ts Outdated
Comment thread webapp/src/ts/services/form.service.ts Outdated
Copy link
Copy Markdown
Contributor

@jkuester jkuester left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, I made it through the rest of the files. Added a few more comments, but all in all this is good stuff! I really appreciate the quality unit tests!

I am going to make a post out on the forum that demos the current "Duplicates Found" UI we have here. The idea would be to maybe crowd-source some UX design in case other folks have input on the look/feel/function.

Finally, before we merge this code (but probably after we finalize the UX) we will need to add some e2e tests. Since there are a lot of moving parts here (config + db docs + forms), we should probably be thorough in validating the main flows. My current thought is to add some new tests cases to tests/e2e/default/contacts/edit.wdio-spec.js and maybe create a new tests/e2e/default/contacts/create.wdio-spec.js.

Comment thread webapp/src/ts/modules/contacts/contacts-edit.component.ts Outdated
Comment thread webapp/tests/karma/ts/services/utils/deduplicate.spec.ts Outdated
Comment thread webapp/tests/karma/ts/services/utils/deduplicate.spec.ts Outdated
Comment thread webapp/src/ts/services/utils/deduplicate.ts Outdated
Comment thread webapp/tests/karma/ts/services/form.service.spec.ts Outdated
@jkuester
Copy link
Copy Markdown
Contributor

Okay, I created a forum post demoing some of the functionality we are building here. Would love any UX feedback from your team on the look/feel/functionality! 🙏

@jkuester
Copy link
Copy Markdown
Contributor

jkuester commented Mar 7, 2025

Sorry for the delayed response here! I will not be able to have a look at the revisions this week, but I plan to do so early next week. 👍

Copy link
Copy Markdown
Contributor

@jkuester jkuester left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Things are coming along great here! 🤩

Comment thread webapp/src/ts/services/xml-forms-context-utils.service.ts Outdated
Comment thread webapp/src/ts/services/xml-forms-context-utils.service.ts Outdated
Comment thread webapp/src/ts/services/xml-forms-context-utils.service.ts Outdated
Comment thread webapp/tests/karma/ts/services/xml-forms-context-utils.service.spec.ts Outdated
Comment thread webapp/tests/karma/ts/services/xml-forms-context-utils.service.spec.ts Outdated
Comment thread webapp/src/ts/components/duplicate-info/duplicate-info.component.html Outdated
Comment thread api/resources/translations/messages-en.properties Outdated
Comment thread api/resources/translations/messages-en.properties Outdated
Comment thread api/resources/translations/messages-en.properties Outdated
@ChinHairSaintClair
Copy link
Copy Markdown
Contributor Author

ChinHairSaintClair commented Mar 21, 2025

I've pushed up the UI changes, translation strings, duplicate-contacts expand/collapse functionality, and the shift of state management to the child component for your review @jkuester. I'll finish up the deduplication and form service next week.

Copy link
Copy Markdown
Contributor

@jkuester jkuester left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is coming along great! Thank you for the updates!

Comment thread webapp/src/ts/services/contacts.service.ts Outdated
Comment thread webapp/src/ts/services/contacts.service.ts Outdated
Comment thread webapp/src/ts/services/deduplicate.service.ts Outdated
Comment thread webapp/src/ts/services/form.service.ts Outdated
Comment thread webapp/tests/karma/ts/services/form.service.spec.ts Outdated
Comment thread webapp/src/ts/components/duplicate-contacts/duplicate-contacts.component.ts Outdated
Comment thread webapp/src/css/enketo/medic.less
Comment thread webapp/src/ts/modules/contacts/contacts-edit.component.ts
Comment thread webapp/src/ts/components/duplicate-contacts/duplicate-contacts.component.ts Outdated
@ChinHairSaintClair
Copy link
Copy Markdown
Contributor Author

@jkuester I ran into a bug in the top-level place duplicate check. Create two top-level places with the name 'test'. On the second creation, you will NOT be prompted with a duplicate check. Logging out/in fixes this. I believe this might be related to the cache used in the get function. Do you know of an effective workaround? Lower-level checks still work as expected.

@jkuester
Copy link
Copy Markdown
Contributor

Oh wow @ChinHairSaintClair this is a good find! After debugging this, it seems like the cache invalidation for the ContactsService is just broken in general.... 🤦 You can re-create similar behavior (where the cache is not busted) by adding a new top-level place and then going to the Reports tab and trying to filter on "Place". Your new top-level place will not show up in the list of places until after you log out and log back in.... 😓

The good news is that the fix should be simple. Basically, the invalidate function for the cache is actually called with a change object, not the actual doc itself. In most cases, the change object will have a doc property that does contain the doc that changed. We just need to call contactTypesService.getTypeId with that inner doc instead of with the change object.

So, line 39 in the contacts.service.ts just becomes:

            invalidate: ({ doc }) => type.id === this.contactTypesService.getTypeId(doc),

@ChinHairSaintClair
Copy link
Copy Markdown
Contributor Author

ChinHairSaintClair commented Apr 1, 2025

@jkuester , the PR should now be ready for the telemetry additions and a bundle size bump.

- Added telemetry and performance tracking for duplicate contact lookup
- Made dupe display more responsive to user selections
- Added and fixed unit and e2e tests (including Enketo widgets)
- Fixed linting issues
- Minor UI tweaks and bug fixes
- Bumped bundle size
@jkuester
Copy link
Copy Markdown
Contributor

jkuester commented Apr 9, 2025

Alright we are very close to getting this landed! 🎉 Here are the remaining tasks before I think we can consider this done:

  • Update cht-conf to support the new duplicate_check property: Support duplicate_check property when uploading contact forms cht-conf#675
    • @ChinHairSaintClair would you possibly be able to look into putting up a PR for this? Should be a very small change. 👍
  • Update cht-docs to document the new feature and configuration options.
    • I am currently working on these changes and hope to raise a PR for review tomorrow.
  • Create a forum thread to collect translation feedback/suggestions for our newly added strings. I am assuming, @ChinHairSaintClair, that you did not source those 9 different translations all from native speakers in your organization 😅 (but maybe you actually did!?!). If not, we can crowdsource this and get feedback from other folks in Medic and the broader community. 👍
    • I will create this forum post and tag @ChinHairSaintClair so hopefully you can help support by making any suggested edits here in this PR 🙏

@jkuester
Copy link
Copy Markdown
Contributor

jkuester commented Apr 9, 2025

Here is the docs PR: medic/cht-docs#1819
And here is the forum post regarding translations: https://forum.communityhealthtoolkit.org/t/translation-support-request-duplicate-contact-prevention-strings/4841

Copy link
Copy Markdown
Contributor

@jkuester jkuester left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code is done! Tests are done! Docs are done! cht-conf updates are done! We were only missing translations for two languages and those langs are not required for merging/releasing this PR (can add the translations later when they are available).

So, I think I am going to hit the merge button on this and call it complete! 🚀

@ChinHairSaintClair thank you for your tireless efforts here to get this right! I am super proud of what we have built! (Also a big thank you to @fardarter for your support and assistance!)

@jkuester jkuester merged commit b965953 into medic:master Apr 18, 2025
25 checks passed
@mrjones-plip
Copy link
Copy Markdown
Contributor

wow! This PR was an amazing collaborative journey and a thing of real beauty to see in open source and to help the CHT. Thanks all!!

ShaunKrog pushed a commit to ShaunKrog/cht-core that referenced this pull request May 15, 2025
…edic#9609)

Co-authored-by: Joshua Kuestersteffen <jkuester@kuester7.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Prevent duplicate sibling contact capture

4 participants