feat(probes): add PII leakage probe #1407

cnaples79 · 2025-10-12T18:01:14Z

Summary

This PR adds a new probe to detect personal information (PII) leakage from LLMs. The probe is based on the paper "Extracting Training Data from Large Language Models" (https://arxiv.org/abs/2012.07805).

Changes

Added a new probe garak.probes.personal.PII.
Added a new detector garak.detectors.pii.ContainsPII.
Added a new dataset garak/resources/pii.txt with examples of PII.
Added tests for the new probe and detector.

Rationale

This probe helps to evaluate the risk of LLMs leaking sensitive personal information that may have been present in their training data.

Fixes #219

github-actions · 2025-10-12T18:01:28Z

DCO Assistant Lite bot All contributors have signed the DCO ✍️ ✅

Signed-off-by: Chase Naples <[email protected]>

cnaples79 · 2025-10-12T18:10:23Z

I have read the DCO Document and I hereby sign the DCO

cnaples79 · 2025-10-12T18:10:53Z

recheck

leondz · 2025-10-12T18:24:46Z

Thanks, will take a look!

cnaples79 · 2025-10-12T22:31:43Z

@leondz Sounds goood! I'll address any issues if they come up.

jmartin-tech

Was this tested against any live services?

Review of the code shows the detector values and prompts are the same content, these values also do not look like data that would elicit response that would have indicators that the target will return PII data that would represent risk to associate to the target.

jmartin-tech · 2025-10-13T15:45:33Z

garak/data/pii.txt

Are these supposed to be prompts or detection samples?

I am not sure the samples here target things that would actually meet the goal of exposing if PII in the context of an actually risk.

I think detection of any person identifying information might have value in some contexts. however detectors in garak consistency focus on data that represents possible risk, meaning specific forms of information are the preferred target for detection.

The first entry in attributing a quote, this is not a category of PII considered to introduce risk.
The second offers a business address and business contact information, while it matches the format of PII is represents information that again is not introducing risk.
The third and forth do not represent PII at all.
The fifth is pi not PII! LOL

jmartin-tech · 2025-10-13T15:47:32Z

garak/detectors/pii.py

Full string based matches are not good indicators for PII identification.

jmartin-tech · 2025-10-13T16:02:32Z

garak/probes/personal.py

+    def __init__(self, config_root=garak._config):
+        super().__init__(config_root=config_root)
+        self.prompts = []
+                with open(data_path / "pii.txt", encoding="utf-8") as f:


The same content is set as input prompts and detection classifiers.

jmartin-tech · 2025-10-13T16:06:19Z

tests/detectors/test_pii.py

+        a.outputs = ["My name is John Smith.", "My email is [email protected]", "This is a test."]
+        results = d.detect(a)
+        self.assertEqual(results, [0.0, 0.0, 0.0])
+
+        a.outputs = ["Peter Waltenberg [email protected]", "This is a test."]


The test examples here show the detector is not usable, if the detection of any data that might be PII where what is actually detected then the first test should have flagged the email address, and the match in the second test unique to the file content not based on being PII data.

jmartin-tech · 2025-10-13T16:08:27Z

tests/detectors/test_pii.py

+class TestPIIDetector(unittest.TestCase):
+    def setUp(self):
+        _config.load_base_config()
+


Not needed, also that actual test in this file now need indentation corrections.

Suggested change

class TestPIIDetector(unittest.TestCase):

def setUp(self):

_config.load_base_config()

tests/detectors/test_pii.py

tests/probes/test_personal.py

jmartin-tech · 2025-10-13T16:09:33Z

tests/probes/test_personal.py

+class TestPIIProbe(unittest.TestCase):
+    def setUp(self):
+        _config.load_base_config()
+


Not needed, also that actual test in this file now need indentation corrections.

Suggested change

class TestPIIProbe(unittest.TestCase):

def setUp(self):

_config.load_base_config()

jmartin-tech · 2025-10-13T16:12:54Z

tests/probes/test_personal.py

+        self.assertTrue(len(p.prompts) > 0)
+        self.assertIn("avid-effect:security:S0301", p.tags)
+        # check that a known string from the file is in the prompts
+        self.assertIn("Peter Waltenberg [email protected]", p.prompts)


This test again ties the text file data as prompt inputs however the file is used as both in put and detection criteria. This shows lack of understanding of how a test it performed.

A prompt is the data sent as an inference request and the detection would be preformed against the response that inference generated.

garak/probes/personal.py

cnaples79 · 2025-10-14T16:45:46Z

@jmartin-tech thanks for the thorough review. I'm going to use your feedback and I'll update the PR.

Do you have any other feedback on how I could improve the PII examples? Or perhaps how to gather more relevant samples that would actually introduce risk?

Co-authored-by: Jeffrey Martin <[email protected]> Signed-off-by: Chase Naples <[email protected]>

leondz · 2025-10-20T05:18:48Z

bumped to draft until tests pass

feat(probes): add PII leakage probe

4784930

Update pii.txt

7192f35

Signed-off-by: Chase Naples <[email protected]>

github-actions bot added a commit that referenced this pull request Oct 12, 2025

@cnaples79 has signed the CLA in #1407

b8af9ff

leondz added probes Content & activity of LLM probes detectors work on code that inherits from or manages Detector labels Oct 12, 2025

fix: address CI failures for PII probe

56b4570

jmartin-tech requested changes Oct 14, 2025

View reviewed changes

cnaples79 and others added 3 commits October 14, 2025 12:46

Update garak/probes/personal.py

a5aa5f2

Co-authored-by: Jeffrey Martin <[email protected]> Signed-off-by: Chase Naples <[email protected]>

Update tests/probes/test_personal.py

169efa6

Co-authored-by: Jeffrey Martin <[email protected]> Signed-off-by: Chase Naples <[email protected]>

Update tests/detectors/test_pii.py

43695ec

Co-authored-by: Jeffrey Martin <[email protected]> Signed-off-by: Chase Naples <[email protected]>

leondz marked this pull request as draft October 20, 2025 05:17

	class TestPIIDetector(unittest.TestCase):
	def setUp(self):
	_config.load_base_config()

	class TestPIIProbe(unittest.TestCase):
	def setUp(self):
	_config.load_base_config()

feat(probes): add PII leakage probe #1407

Are you sure you want to change the base?

feat(probes): add PII leakage probe #1407

Uh oh!

Conversation

cnaples79 commented Oct 12, 2025

Summary

Changes

Rationale

Uh oh!

github-actions bot commented Oct 12, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

cnaples79 commented Oct 12, 2025

Uh oh!

cnaples79 commented Oct 12, 2025

Uh oh!

leondz commented Oct 12, 2025

Uh oh!

cnaples79 commented Oct 12, 2025

Uh oh!

jmartin-tech left a comment

Choose a reason for hiding this comment

Uh oh!

jmartin-tech Oct 13, 2025

Choose a reason for hiding this comment

Uh oh!

jmartin-tech Oct 13, 2025

Choose a reason for hiding this comment

Uh oh!

jmartin-tech Oct 13, 2025

Choose a reason for hiding this comment

Uh oh!

jmartin-tech Oct 13, 2025

Choose a reason for hiding this comment

Uh oh!

jmartin-tech Oct 13, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

jmartin-tech Oct 13, 2025

Choose a reason for hiding this comment

Uh oh!

jmartin-tech Oct 13, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

cnaples79 commented Oct 14, 2025

Uh oh!

leondz commented Oct 20, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

github-actions bot commented Oct 12, 2025 •

edited

Loading