Skip to content

Commit 623c64b

Browse files
committed
Document PII in Enterprise DS-4779
1 parent 02b84a3 commit 623c64b

File tree

1 file changed

+49
-15
lines changed

1 file changed

+49
-15
lines changed

docs/Developer-Guide.md

Lines changed: 49 additions & 15 deletions
Original file line numberDiff line numberDiff line change
@@ -581,52 +581,86 @@ By default, SWIRL loads **English stopwords**. To change this:
581581

582582
## Redact or Remove Personally Identifiable Information (PII)
583583

584-
SWIRL supports **PII removal and redaction** using [Microsoft Presidio](https://microsoft.github.io/presidio/).
584+
SWIRL supports redaction or removal of PII from queries and results, via [Microsoft Presidio](https://microsoft.github.io/presidio/).
585585

586-
**`RemovePIIQueryProcessor` (Redacts Queries)**
586+
** RedactPIIQueryProcessor **
587587

588-
Removes PII **before querying**.
588+
This processor redacts PII entities in queries. For example: `Captain James T. Kirk``Captain [PERSON]`
589589

590-
*Enable for a Specific SearchProvider:*
590+
To enable for a specific SearchProvider, add it before the `Adaptive` or `NoMod` Query Processor.
591591

592592
```json
593593
"query_processors": [
594-
"AdaptiveQueryProcessor",
595-
"RemovePIIQueryProcessor"
594+
"RedactPIIQueryProcessor",
595+
"AdaptiveQueryProcessor"
596596
]
597597
```
598598

599-
*Enable for ALL SearchProviders:*
599+
{.warning}
600+
If the API receiving the redacted PII can't handle brackets `[]`, use the `AdaptiveQueryProcessor` *after* PII redaction to remove them.
600601

601-
Modify `swirl/models.py`:
602+
** RemovePIIQueryProcessor **
603+
604+
This processor removes detected PII entities from queries entirely.
605+
606+
To enable for a specific SearchProvider, add it before the `Adaptive` or `NoMod` Query Processor.
607+
608+
```json
609+
"query_processors": [
610+
"RemovePIIQueryProcessor",
611+
"AdaptiveQueryProcessor"
612+
]
613+
```
614+
615+
To add either of these to the pre-query processing pipeline, so it runs before any SearchProvider query processing:
616+
617+
1. Add it to the `search.prequery_processing` list. This is only supported via the SWIRL API.
618+
619+
2. Modify `swirl/models.py`:
602620

603621
```python
604622
def getSearchPreQueryProcessorsDefault():
605623
return ["RemovePIIQueryProcessor"]
606624
```
607625

608-
More details: [ResultProcessors](./Developer-Reference#result-processors)
626+
And restart SWIRL. [Contact support](#support) for assistance.
609627

610-
**`RemovePIIResultProcessor` (Redacts Results)**
628+
For more information: [ResultProcessors](./Developer-Reference#result-processors)
611629

612-
Redacts PII **in results** (e.g., `"James T. Kirk"``"<PERSON>"`).
630+
** RedactPIIResultProcessor **
613631

614-
*Enable for a Specific SearchProvider:*
632+
Redacts PII in results. In a document, for example: `These are the logs of Captain James T. Kirk.``"These are the logs of Captain [PERSON]"`
615633

616634
```json
617635
"result_processors": [
618636
"MappingResultProcessor",
619637
"DateFinderResultProcessor",
620638
"CosineRelevancyResultProcessor",
621-
"RemovePIIResultProcessor"
639+
"RedactPIIResultProcessor"
622640
]
623641
```
624642

625643
More details: [ResultProcessors](./Developer-Reference#post-result-processors)
626644

627-
**`RemovePIIPostResultProcessor`**
645+
{.note}
646+
There is no RemovePIIResultProcessor at this time as it may impair use of AI.
647+
648+
** RedactPIIPostResultProcessor **
649+
650+
This processor applies PII redaction from the unified results, from all responding sources.
651+
652+
To add either of these to the pre-query processing pipeline, so it runs before any SearchProvider query processing:
653+
654+
1. Add it to the `search.prequery_processing` list. This is only supported via the SWIRL API.
655+
656+
2. Modify `swirl/models.py`:
657+
658+
```python
659+
def getSearchPostResultProcessorsDefault():
660+
return ["CosineRelevancyPostResultProcessor","RedactPIIPostResultProcessor"]
661+
```
628662

629-
This processor applies **PII redaction after all results are processed**.
663+
This configuration re-ranks using entities, but then redacts them in the results displayed to the user. This leaves the entities in the explain vector, which is available via the API. To prevent this, [disable the explain vector by setting `SWIRL_EXPLAIN` to `False`](TBD).
630664

631665
## Understand the Explain Structure
632666

0 commit comments

Comments
 (0)