You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The current implementation of our PII policies relies predominantly on regular expressions (Regex). While this method has served us so far, it could have the tendency to yield a notable number of false positives and misses. To bolster the accuracy and robustness of our scanner, we propose the integration of Named Entity Recognition (NER) techniques along with other relevant methods to detect PII.
Background:
Regex checks, though efficient for specific patterns, often fail to capture the contextual nuances of data, leading to both false positives and false negatives.
Named Entity Recognition (NER) is an established method in the field of Natural Language Processing (NLP) and is adept at identifying entities in text, including PII such as names, addresses, phone numbers, and more.
By combining Regex with NER, we can potentially improve the precision and recall of our PII detection mechanism.
Proposed Solution:
Integration of NER Models: Incorporate established NER models into the scanning process. Libraries like Spacy or the NER capabilities of HuggingFace's Transformers library can be considered for this.
Hybrid Approach: Use a combined strategy of Regex and NER. Start with Regex checks to rapidly filter potential matches, and then employ NER to validate and further refine those matches, ensuring fewer false positives and better overall detection.
Description:
The current implementation of our PII policies relies predominantly on regular expressions (Regex). While this method has served us so far, it could have the tendency to yield a notable number of false positives and misses. To bolster the accuracy and robustness of our scanner, we propose the integration of Named Entity Recognition (NER) techniques along with other relevant methods to detect PII.
Background:
Proposed Solution:
Integration of NER Models: Incorporate established NER models into the scanning process. Libraries like Spacy or the NER capabilities of HuggingFace's Transformers library can be considered for this.
Hybrid Approach: Use a combined strategy of Regex and NER. Start with Regex checks to rapidly filter potential matches, and then employ NER to validate and further refine those matches, ensuring fewer false positives and better overall detection.
The text was updated successfully, but these errors were encountered: