Skip to content

Commit b6edf8e

Browse files
Merge pull request #5303 from pablintino/devex-mco-sanitize-main
MCO-1685: Add mco-sanitize utility main logic
2 parents 9b4e510 + 2e453ab commit b6edf8e

File tree

207 files changed

+37947
-2
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

207 files changed

+37947
-2
lines changed

devex/cmd/mco-sanitize/README.md

Lines changed: 158 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,158 @@
1+
# mco-sanitize
2+
3+
A command-line tool that removes sensitive information from Machine Config Operator (MCO) must-gather reports while
4+
preserving their structure for debugging and analysis purposes.
5+
6+
## Overview
7+
8+
`mco-sanitize` is designed to sanitize OpenShift must-gather reports by redacting sensitive data from Kubernetes
9+
resources according to configurable rules. The tool maintains the original file structure and metadata while r
10+
eplacing sensitive content with redaction markers that preserve data length information for analysis.
11+
12+
## Features
13+
14+
- **Configurable Redaction**: Define which Kubernetes resource types and fields to sanitize via YAML configuration
15+
- **Parallel Processing**: Multi-threaded file processing for improved performance (defaults to CPU core count)
16+
- **Encrypted Output**: Automatically creates GPG-encrypted tar.gz archives of sanitized data
17+
- **Multiple File Format Support**: Handles YAML, JSON, and other text-based files
18+
- **Path-based Targeting**: Use dot-notation paths to target specific fields within resources
19+
- **Array Support**: Process all elements (`*`) or specific indices in arrays
20+
- **Namespace Filtering**: Optionally limit redaction to specific namespaces
21+
22+
## Installation
23+
24+
Build the tool from source:
25+
26+
```bash
27+
go build -o mco-sanitize .
28+
```
29+
30+
## Usage
31+
32+
### Basic Usage
33+
34+
```bash
35+
# Sanitize a must-gather directory
36+
./mco-sanitize --input /path/to/must-gather
37+
38+
# Sanitize and create encrypted archive
39+
./mco-sanitize --input /path/to/must-gather --output /path/to/sanitized.tar.gz
40+
41+
# Use custom worker count
42+
./mco-sanitize --input /path/to/must-gather --workers 8
43+
```
44+
45+
### Command Line Options
46+
47+
- `--input` (required): Path to the must-gather directory to sanitize
48+
- `--output` (optional): Path where the encrypted tar.gz output should be saved
49+
- `--workers` (optional): Number of worker threads (defaults to CPU core count)
50+
51+
## Configuration
52+
53+
### Default Configuration
54+
55+
The tool includes a built-in default configuration that redacts sensitive fields from common MCO resources:
56+
57+
```yaml
58+
redact:
59+
- kind: MachineConfig
60+
apiVersion: machineconfiguration.openshift.io/v1
61+
paths:
62+
- spec.config.storage.files.*.contents
63+
- spec.config.systemd.units.*.contents
64+
- spec.config.systemd.units.*.dropins.*.contents
65+
- kind: ControllerConfig
66+
apiVersion: machineconfiguration.openshift.io/v1
67+
paths:
68+
- spec.internalRegistryPullSecret
69+
- spec.kubeAPIServerServingCAData
70+
- spec.rootCAData
71+
- spec.additionalTrustBundle
72+
```
73+
74+
### Custom Configuration
75+
76+
Override the default configuration using the `MCO_MUST_GATHER_SANITIZER_CFG` environment variable:
77+
78+
```bash
79+
# Using a configuration file
80+
export MCO_MUST_GATHER_SANITIZER_CFG="/path/to/config.yaml"
81+
82+
# Using base64-encoded configuration
83+
export MCO_MUST_GATHER_SANITIZER_CFG="cmVkYWN0Og0KIC0ga2luZDogU2VjcmV0..."
84+
```
85+
86+
#### Configuration Format
87+
88+
```yaml
89+
redact:
90+
- kind: Pod # Kubernetes resource kind (required)
91+
apiVersion: v1 # API version (optional, matches all if empty)
92+
namespaces: # Limit to specific namespaces (optional)
93+
- kube-system
94+
- openshift-config
95+
paths: # Fields to redact using dot notation
96+
- spec.containers.*.env.*.value
97+
- data.password
98+
- metadata.annotations.secret-key
99+
```
100+
101+
#### Path Syntax
102+
103+
- Use dot notation to navigate object hierarchies: `spec.containers.0.image`
104+
- Use `*` for all array elements: `spec.containers.*.env.*.value`
105+
- Use numeric indices for specific array elements: `spec.containers.0.ports.1.containerPort`
106+
- Combine object and array navigation: `data.config.yaml.databases.*.password`
107+
108+
## Encryption
109+
110+
### Default Encryption
111+
112+
By default, archives are encrypted using an embedded GPG public key. This ensures that sanitized data remains
113+
secure during transport and storage.
114+
115+
### Custom Encryption Key
116+
117+
Provide your own GPG public key using the `MCO_MUST_GATHER_SANITIZER_KEY` environment variable:
118+
119+
```bash
120+
# Using base64-encoded public key
121+
export MCO_MUST_GATHER_SANITIZER_KEY="$(base64 -w 0 < /path/to/public-key.asc)"
122+
```
123+
124+
## Redaction Behavior
125+
126+
When a field is redacted, it's replaced with a structured object containing:
127+
128+
```yaml
129+
_REDACTED: "This field has been redacted"
130+
length: 1234 # Original content length in characters
131+
```
132+
133+
This preserves:
134+
- The fact that sensitive data existed
135+
- The approximate size of the original data
136+
- The overall structure of the resource
137+
138+
139+
## Environment Variables
140+
141+
| Variable | Description | Example |
142+
|----------|-------------|---------|
143+
| `MCO_MUST_GATHER_SANITIZER_CFG` | Custom configuration (file path or base64) | `/path/to/config.yaml` |
144+
| `MCO_MUST_GATHER_SANITIZER_KEY` | Custom GPG public key (base64-encoded) | `LS0tLS1CRUdJTi...` |
145+
146+
## Development
147+
148+
### Running Tests
149+
150+
```bash
151+
go test ./...
152+
```
153+
154+
### Adding New Redaction Rules
155+
156+
1. Update the default configuration in `data/default-config.yaml`
157+
2. Add test cases in the `testdata/` directory
158+
3. Run tests to verify behavior

devex/cmd/mco-sanitize/archive.go

Lines changed: 134 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,134 @@
1+
// Assisted-by: Claude
2+
package main
3+
4+
import (
5+
"archive/tar"
6+
"bytes"
7+
"compress/gzip"
8+
_ "embed"
9+
"encoding/base64"
10+
"errors"
11+
"io"
12+
"os"
13+
"path/filepath"
14+
15+
"github.com/ProtonMail/go-crypto/openpgp"
16+
)
17+
18+
const McoMustGatherSanitizerEncryptKeyEnvVar = "MCO_MUST_GATHER_SANITIZER_KEY"
19+
20+
// pabrodri/pablintino's GPG key by default
21+
//
22+
//go:embed data/public-key.asc
23+
var defaultGpgKey []byte
24+
25+
func Archive(src, target string) (err error) {
26+
entityList, err := getEncryptionKey()
27+
if err != nil {
28+
return err
29+
}
30+
buf := &bytes.Buffer{}
31+
if err := writeTar(src, buf); err != nil {
32+
return err
33+
}
34+
35+
targetFile, err := os.OpenFile(target, os.O_RDWR|os.O_CREATE|os.O_TRUNC, 0o666)
36+
if err != nil {
37+
return err
38+
}
39+
defer func() {
40+
if dstErr := targetFile.Close(); dstErr != nil {
41+
err = errors.Join(err, dstErr)
42+
}
43+
}()
44+
return encrypt(entityList, buf, targetFile)
45+
}
46+
47+
func getEncryptionKey() (openpgp.EntityList, error) {
48+
// Early load and fail if the key is not present/valid
49+
content := os.Getenv(McoMustGatherSanitizerEncryptKeyEnvVar)
50+
var gpgBuffer *bytes.Buffer
51+
52+
if content == "" {
53+
gpgBuffer = bytes.NewBuffer(defaultGpgKey)
54+
} else {
55+
rawBytes, err := base64.StdEncoding.DecodeString(content)
56+
if err != nil {
57+
return nil, err
58+
}
59+
gpgBuffer = bytes.NewBuffer(rawBytes)
60+
}
61+
62+
entityList, err := openpgp.ReadArmoredKeyRing(gpgBuffer)
63+
if err != nil {
64+
return nil, err
65+
}
66+
return entityList, err
67+
}
68+
69+
func writeTar(src string, writer io.Writer) (err error) {
70+
zr := gzip.NewWriter(writer)
71+
defer func() {
72+
if zrErr := zr.Close(); zrErr != nil {
73+
err = errors.Join(err, zrErr)
74+
}
75+
}()
76+
tw := tar.NewWriter(zr)
77+
defer func() {
78+
if twErr := tw.Close(); twErr != nil {
79+
err = errors.Join(err, twErr)
80+
}
81+
}()
82+
83+
// walk through every file in the folder
84+
return filepath.Walk(src, func(file string, fi os.FileInfo, err error) error {
85+
if err != nil {
86+
return err
87+
}
88+
relPath, err := filepath.Rel(src, file)
89+
if err != nil {
90+
return err
91+
}
92+
header, err := tar.FileInfoHeader(fi, relPath)
93+
if err != nil {
94+
return err
95+
}
96+
97+
header.Name = filepath.ToSlash(relPath)
98+
99+
if err := tw.WriteHeader(header); err != nil {
100+
return err
101+
}
102+
// if not a dir, write file content
103+
if !fi.IsDir() {
104+
data, err := os.Open(file)
105+
if err != nil {
106+
return err
107+
}
108+
defer data.Close()
109+
if err != nil {
110+
return err
111+
}
112+
if _, err := io.Copy(tw, data); err != nil {
113+
return err
114+
}
115+
}
116+
return nil
117+
})
118+
}
119+
120+
func encrypt(entities []*openpgp.Entity, reader io.Reader, writer io.Writer) (err error) {
121+
gpgWriter, err := openpgp.Encrypt(writer, entities, nil, &openpgp.FileHints{IsBinary: true}, nil)
122+
if err != nil {
123+
return err
124+
}
125+
defer func() {
126+
if wCloseErr := gpgWriter.Close(); wCloseErr != nil {
127+
err = errors.Join(err, wCloseErr)
128+
}
129+
}()
130+
if _, err := io.Copy(gpgWriter, reader); err != nil {
131+
return err
132+
}
133+
return err
134+
}
Lines changed: 71 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,71 @@
1+
package main
2+
3+
import (
4+
"os"
5+
"path/filepath"
6+
"testing"
7+
8+
"github.com/stretchr/testify/assert"
9+
"github.com/stretchr/testify/require"
10+
)
11+
12+
func TestArchive_Success(t *testing.T) {
13+
// Create a temporary directory with test files
14+
tempDir := t.TempDir()
15+
testFile := filepath.Join(tempDir, "test.txt")
16+
require.NoError(t, os.WriteFile(testFile, []byte("test content"), 0644))
17+
18+
// Create a temporary output file
19+
outputFile := filepath.Join(t.TempDir(), "output.tar.gz.gpg")
20+
21+
// Test the Archive function
22+
err := Archive(tempDir, outputFile)
23+
24+
assert.NoError(t, err)
25+
assert.FileExists(t, outputFile)
26+
27+
// Verify the output file is not empty
28+
stat, err := os.Stat(outputFile)
29+
require.NoError(t, err)
30+
assert.Greater(t, stat.Size(), int64(0))
31+
32+
// Verify the file is GPG encrypted by checking for GPG binary markers
33+
fileContent, err := os.ReadFile(outputFile)
34+
require.NoError(t, err)
35+
36+
// GPG encrypted files start with packet type markers (high bit set, indicating packet type)
37+
assert.True(t, len(fileContent) > 0 && fileContent[0] >= 0x80,
38+
"File should be GPG encrypted (expected GPG packet marker >= 0x80, got 0x%02x)", fileContent[0])
39+
}
40+
41+
func TestArchive_NonExistentSource(t *testing.T) {
42+
nonExistentDir := "/non/existent/directory"
43+
outputFile := filepath.Join(t.TempDir(), "output.tar.gz.gpg")
44+
45+
err := Archive(nonExistentDir, outputFile)
46+
47+
assert.Error(t, err)
48+
}
49+
50+
func TestArchive_InvalidTarget(t *testing.T) {
51+
tempDir := t.TempDir()
52+
testFile := filepath.Join(tempDir, "test.txt")
53+
require.NoError(t, os.WriteFile(testFile, []byte("test content"), 0644))
54+
55+
// Try to write to a directory that doesn't exist
56+
invalidTarget := "/non/existent/directory/output.tar.gz.gpg"
57+
58+
err := Archive(tempDir, invalidTarget)
59+
60+
assert.Error(t, err)
61+
}
62+
63+
func TestArchive_EmptyDirectory(t *testing.T) {
64+
tempDir := t.TempDir()
65+
outputFile := filepath.Join(t.TempDir(), "output.tar.gz.gpg")
66+
67+
err := Archive(tempDir, outputFile)
68+
69+
assert.NoError(t, err)
70+
assert.FileExists(t, outputFile)
71+
}

0 commit comments

Comments
 (0)