Restructure backend manifest JSON with hierarchical keys and unified parameters

## Problem

The current backend manifest JSON structure has several issues:

1. **Concatenated string keys**: Keys like `"phase2_pipeline_zeroshot_comparative_regression_mistralai/Mistral-7B-Instruct-v0.3_merit-0.0"` are hard to parse and filter
2. **Config duplication**: Full experiment config is duplicated across every scenario 
3. **Poor structure**: Difficult to query by specific ADM, LLM, or KDMA parameters
4. **Artificial scenario indexing**: Currently appends indices to scenario IDs instead of using actual scene structure from `input.full_state.meta_info.scene_id`
5. **No clear separation**: Scenarios vs scenes are conflated
6. **File duplication**: Creates filtered copies of input_output.json files unnecessarily
7. **Limited extensibility**: Hard to add new parameter dimensions

## Enhanced Solution

After further analysis, we've refined the approach to use a **flexible parameter-based structure** with integrity validation and fast lookup indices.

### Key Design Decisions

1. **Flexible parameters**: No rigid hierarchy - can handle ADMs with/without LLMs and future extensions
2. **No file duplication**: Reference original files with source indices 
3. **Hash-based experiment keys**: Deterministic keys generated from parameter hash
4. **Fast lookups**: Reverse indices for efficient filtering
5. **Integrity validation**: File checksums prevent stale data issues

### New Structure

```json
{
  "manifest_version": "2.0",
  "generated_at": "2025-07-18T15:30:00Z",
  "experiments": {
    "exp_a1b2c3d4": {
      "parameters": {
        "adm": {
          "name": "phase2_pipeline_zeroshot_comparative_regression",
          "instance": { /* full ADM config */ }
        },
        "llm": {
          "model_name": "mistralai/Mistral-7B-Instruct-v0.3",
          "precision": "half"
        }  < /dev/null |  null,  // null for non-LLM ADMs
        "kdma_values": [{"kdma": "merit", "value": 0.5}],  // [] for unaligned
        "alignment_target_id": "ADEPT-June2025-merit-0.5",
        "run_variant": "default"
      },
      "scenarios": {
        "June2025-MF1-eval": {
          "input_output": {
            "file": "data/2025-06-23__12-28-29/input_output.json",
            "checksum": "sha256:a1b2c3d4e5f6...",
            "alignment_target_filter": "ADEPT-June2025-merit-0.5"
          },
          "scores": null,
          "timing": "data/2025-06-23__12-28-29/timing.json",
          "scenes": {
            "Probe 1": { "source_index": 5, "scene_id": "Probe 1" },
            "Probe 5": { "source_index": 12, "scene_id": "Probe 5" }
          }
        }
      }
    }
  },
  "indices": {
    "by_adm": {
      "phase2_pipeline_zeroshot_comparative_regression": ["exp_a1b2c3d4"],
      "rule_based_baseline": ["exp_e5f6g7h8"]
    },
    "by_llm": {
      "mistralai/Mistral-7B-Instruct-v0.3": ["exp_a1b2c3d4"],
      "no-llm": ["exp_e5f6g7h8"]
    },
    "by_kdma": {
      "merit-0.5": ["exp_a1b2c3d4"],
      "unaligned": ["exp_e5f6g7h8"]
    },
    "by_scenario": {
      "June2025-MF1-eval": ["exp_a1b2c3d4", "exp_e5f6g7h8"]
    }
  },
  "files": {
    "data/2025-06-23__12-28-29/input_output.json": {
      "checksum": "sha256:a1b2c3d4e5f6...",
      "size": 2048576,
      "experiments": ["exp_a1b2c3d4", "exp_e5f6g7h8"]
    }
  }
}
```

### Experiment Key Generation

Keys are generated deterministically from parameter hash:

```javascript
function generateExperimentKey(parameters) {
  const keyData = {
    adm: parameters.adm.name,
    llm: parameters.llm?.model_name || "no-llm",
    kdma: parameters.kdma_values.map(kv => `${kv.kdma}-${kv.value}`).sort().join('_') || "unaligned",
    run_variant: parameters.run_variant || "default"
  };
  
  const hash = sha256(JSON.stringify(keyData));
  return `exp_${hash.substring(0, 8)}`;
}
```

### Handling Complex Cases

1. **Multiple experiments per file**: Each gets separate entry with different `source_index` values
2. **ADMs without LLMs**: Use `"llm": null` 
3. **Unaligned experiments**: Use `"kdma_values": []`
4. **Run variants**: Included in parameter hash for unique keys
5. **Future parameters**: Easy to add to `parameters` object

## Benefits

- **No file duplication**: Keep original files, use indices
- **Fast queries**: Pre-built indices for common filters
- **Extensible**: Easy to add new parameter types
- **Integrity**: Checksums prevent stale data issues
- **Flexible**: Handles all current and future ADM/LLM combinations
- **Efficient**: Smaller file sizes, better performance
- **Maintainable**: Clear separation of concerns

## Implementation Tasks

- [ ] Update `ExperimentConfig` to generate flexible parameter-based keys  
- [ ] Modify `GlobalManifest` class to build enhanced structure with indices
- [ ] Update `experiment_parser.py` to extract scene_id and build source_index mappings
- [ ] Add file checksum calculation and integrity validation
- [ ] Implement reverse mapping indices (by_adm, by_llm, by_kdma, by_scenario)
- [ ] Handle ADMs without LLMs (use null approach)
- [ ] Update frontend to consume new structure and use indices for fast queries
- [ ] Add comprehensive tests for enhanced manifest structure
- [ ] Implement backward compatibility during transition

This approach provides a robust, extensible foundation that can grow with the system's needs while solving all current structural issues.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Restructure backend manifest JSON with hierarchical keys and unified parameters #24

Problem

Enhanced Solution

Key Design Decisions

New Structure

Experiment Key Generation

Handling Complex Cases

Benefits

Implementation Tasks

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Restructure backend manifest JSON with hierarchical keys and unified parameters #24

Description

Problem

Enhanced Solution

Key Design Decisions

New Structure

Experiment Key Generation

Handling Complex Cases

Benefits

Implementation Tasks

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions