Skip to content

Commit 2aebc58

Browse files
committed
Merge branch 'main' of https://github.com/EleutherAI/delphi
2 parents e62a0eb + 12ba461 commit 2aebc58

File tree

4 files changed

+11
-2
lines changed

4 files changed

+11
-2
lines changed

README.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -16,15 +16,15 @@ Install this library as a local editable installation. Run the following command
1616

1717
To run the default pipeline from the command line, use the following command:
1818

19-
`python -m delphi EleutherAI/pythia-160m EleutherAI/Pythia-160m-SST-k32-32k --n_tokens 10_000_000 --max_latents 100 --hookpoints layers.5 --scorers detection --filter_bos --name llama-3-8B`
19+
`python -m delphi EleutherAI/pythia-160m EleutherAI/Pythia-160m-SST-k32-32k --n_tokens 10_000_000 --max_latents 100 --hookpoints layers.5.mlp --scorers detection --filter_bos --name llama-3-8B`
2020

2121
This command will:
2222
1. Cache activations for the first 10 million tokens of the default dataset, `EleutherAI/SmolLM2-135M-10B`.
2323
2. Generate explanations for the first 100 features of layer 5 using the default explainer model, `hugging-quants/Meta-Llama-3.1-70B-Instruct-AWQ-INT4`.
2424
3. Score the explanations using the detection scorer.
2525
4. Log summary metrics including per-scorer F1 scores and confusion matrices, and produce histograms of the scorer classification accuracies.
2626

27-
The pipeline is highly configurable and can also be called programmatically (see the [end-to-end test](https://github.com/EleutherAI/delphi/blob/main/delphi/tests/e2e.py) for an example).
27+
The pipeline is highly configurable and can also be called programmatically (see the [end-to-end test](https://github.com/EleutherAI/delphi/blob/main/tests/e2e.py) for an example).
2828

2929
To use experimental features, create a custom pipeline. You can take inspiration from the main pipeline in [delphi.\_\_main\_\_](https://github.com/EleutherAI/delphi/blob/main/delphi/__main__.py).
3030

delphi/__main__.py

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -256,6 +256,7 @@ def scorer_postprocess(result, score_dir):
256256
n_examples_shown=run_cfg.num_examples_per_scorer_prompt,
257257
verbose=run_cfg.verbose,
258258
log_prob=run_cfg.log_probs,
259+
fuzz_type=run_cfg.fuzz_type,
259260
)
260261
elif scorer_name == "detection":
261262
scorer = DetectionScorer(

delphi/config.py

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -160,6 +160,10 @@ class RunConfig(Serializable):
160160
)
161161
"""Scorer methods to score latent explanations. Options are 'fuzz', 'detection', and
162162
'simulation'."""
163+
fuzz_type: Literal["default", "active"] = "default"
164+
"""Type of fuzzing to use for the fuzz scorer. Default uses non-activating
165+
examples and highlights n_incorrect tokens. Active uses activating examples
166+
and highlights non-activating tokens."""
163167

164168
name: str = ""
165169
"""The name of the run. Results are saved in a directory with this name."""

delphi/scorers/classifier/fuzz.py

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -38,6 +38,10 @@ def __init__(
3838
it harder for models to generate anwers in the correct format.
3939
log_prob: Whether to use log probabilities to allow for AUC calculation.
4040
generation_kwargs: Additional generation kwargs.
41+
temperature: Which temperature to use for the scorer model.
42+
fuzz_type: Which type of fuzzing to use. Default uses non-activating
43+
examples and highlights n_incorrect tokens. Active uses activating
44+
examples and highlights non-activating tokens.
4145
"""
4246
super().__init__(
4347
client=client,

0 commit comments

Comments
 (0)