Skip to content

feat: add HDSentinel backup-drive monitoring#16

Open
chrismin13 wants to merge 3 commits intomainfrom
feat/hdsentinel-support
Open

feat: add HDSentinel backup-drive monitoring#16
chrismin13 wants to merge 3 commits intomainfrom
feat/hdsentinel-support

Conversation

@chrismin13
Copy link
Copy Markdown
Owner

Integrate HDSentinel into the shared drive-health flow so backup-drive health changes surface in the UI and alerts while staying testable in fake mode.

Integrate HDSentinel into the shared drive-health flow so backup-drive health changes surface in the UI and alerts while staying testable in fake mode.
Copilot AI review requested due to automatic review settings April 1, 2026 23:57
@chatgpt-codex-connector
Copy link
Copy Markdown

You have reached your Codex usage limits for code reviews. You can see your limits in the Codex usage dashboard.

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR integrates HDSentinel into the existing drive-health flow so backup-drive health (via SMART + ML prediction and HDSentinel status) is surfaced in the UI, scheduled checks, and the summary API while remaining usable in fake mode.

Changes:

  • Added a unified drive_health.py module to centralize SMART telemetry/prediction plus HDSentinel snapshot/state + alerting.
  • Updated the Drive Health UI + endpoint handling to display HDSentinel status and allow toggling HDSentinel monitoring/health-change alerts.
  • Switched the systemd health-check entrypoint from check_health.sh to check_health.py and updated install/docs to support HDSentinel installation.

Reviewed changes

Copilot reviewed 10 out of 10 changed files in this pull request and generated 2 comments.

Show a summary per file
File Description
templates/drive_health.html Adds HDSentinel monitoring section + settings form and routes POST actions via form_action.
app.py Uses the new shared drive-health module; adds HDSentinel data to UI flow and API summary.
drive_health.py New shared implementation for SMART read/prediction/telemetry and HDSentinel snapshot + state + alerting.
scripts/check_health.py New systemd-friendly Python entrypoint that runs the scheduled health check and prints a concise summary.
scripts/check_health.sh Reduced to a compatibility wrapper that delegates to check_health.py.
system_utils.py Installs the new script and updates systemd service ExecStart + config generation for HDSentinel settings.
install.sh Adds optional HDSentinel download/install and ensures required packages are installed.
config_manager.py Adds default [hdsentinel] settings to the generated config.
docs/manual_install.md Documents installing the vendor HDSentinel binary and required packages.
docs/drive_health.md Updates Drive Health documentation to include HDSentinel status/settings + alert behavior.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

chrismin13 and others added 2 commits April 2, 2026 03:03
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 10 out of 10 changed files in this pull request and generated 5 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +673 to +674
except Exception as exc:
settings_error = f"Failed to save HDSentinel settings: {exc}"
Copy link

Copilot AI Apr 2, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The UI error banner for saving HDSentinel settings interpolates the raw exception text into the response. This can leak internal details (paths, config values, stack-adjacent messages) to any authenticated user. Consider logging the exception server-side and returning a generic user-facing message (optionally with a short correlation ID).

Copilot uses AI. Check for mistakes.
Comment on lines +1383 to +1387
hdsentinel_enabled = False
if isinstance(hdsentinel_settings, dict):
hdsentinel_enabled = bool(hdsentinel_settings.get('enabled'))
else:
hdsentinel_enabled = bool(getattr(hdsentinel_settings, 'enabled', False))
Copy link

Copilot AI Apr 2, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

get_hdsentinel_settings() always returns a dict, but the API summary code includes a dict-vs-attribute branch. This extra branching is dead code and makes the endpoint harder to read/maintain; simplify to a single dict access (and keep the payload shape stable).

Suggested change
hdsentinel_enabled = False
if isinstance(hdsentinel_settings, dict):
hdsentinel_enabled = bool(hdsentinel_settings.get('enabled'))
else:
hdsentinel_enabled = bool(getattr(hdsentinel_settings, 'enabled', False))
hdsentinel_enabled = bool(hdsentinel_settings.get('enabled'))

Copilot uses AI. Check for mistakes.
) as tmp_file:
json.dump(payload, tmp_file, indent=2)
tmp_path = Path(tmp_file.name)
tmp_path.replace(path)
Copy link

Copilot AI Apr 2, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

_write_json_atomically writes the HDSentinel state file but never sets permissions on the resulting file. Because snapshots include drive serial/model data, it’s safer to explicitly restrict readability (e.g., chmod 0o600 after replace) and/or ensure the parent dir permissions are appropriate.

Suggested change
tmp_path.replace(path)
tmp_path.replace(path)
if os.name == "posix":
try:
os.chmod(path, 0o600)
except OSError as exc:
LOGGER.warning("Failed to set permissions on %s: %s", path, exc)

Copilot uses AI. Check for mistakes.
Comment on lines +73 to +107
case "$arch" in
amd64)
url="https://www.hdsentinel.com/hdslin/hdsentinel-020c-x64.zip"
;;
arm64)
url="https://www.hdsentinel.com/hdslin/hdsentinel-armv8.zip"
;;
*)
echo -e "${YELLOW}HDSentinel auto-install skipped: unsupported architecture '${arch:-unknown}'.${NC}"
return 0
;;
esac

tmpdir=$(mktemp -d)
if ! curl -L --fail --output "$tmpdir/hdsentinel.zip" "$url"; then
echo -e "${YELLOW}HDSentinel download failed. Continuing without it.${NC}"
rm -rf "$tmpdir"
return 0
fi

if ! unzip -o "$tmpdir/hdsentinel.zip" -d "$tmpdir" >/dev/null; then
echo -e "${YELLOW}HDSentinel extraction failed. Continuing without it.${NC}"
rm -rf "$tmpdir"
return 0
fi

if [ ! -f "$tmpdir/HDSentinel" ]; then
echo -e "${YELLOW}HDSentinel binary not found in downloaded archive. Continuing without it.${NC}"
rm -rf "$tmpdir"
return 0
fi

install -m 755 "$tmpdir/HDSentinel" "$HDSENTINEL_BIN"
rm -rf "$tmpdir"
echo -e "${GREEN}✔ HDSentinel installed to $HDSENTINEL_BIN.${NC}\n"
Copy link

Copilot AI Apr 2, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The installer downloads and installs a third-party binary (HDSentinel) over the network without any integrity verification (checksum/signature) or pinning beyond a URL. This is a supply-chain risk; consider verifying a published checksum/signature, or making the install an explicit opt-in step with clear instructions for manual verification.

Copilot uses AI. Check for mistakes.
Comment on lines +5 to +10
if [ -x "/opt/SimpleSaferServer/scripts/check_health.py" ]; then
exec /usr/bin/python3 "/opt/SimpleSaferServer/scripts/check_health.py"
fi

if [ -z "$parent_device" ]; then
send_email "Drive Health Check Failed - Cannot Determine Parent Device" "Cannot determine parent device for $partition_device"
exit 1
fi

echo "Checking health of device: $parent_device"

# Get SMART attributes using smartctl
echo "Getting SMART attributes..."
smart_attrs=$(smartctl -A "$parent_device" | grep -E "^( [0-9]+| [0-9]+)" | grep -E "^( 5| 9| 12|177|194|197|198|199|200|201|202|203|204|205|206|207|208|209|211|212|220|221|222|223|224|225|226|227|228|230|231|232|233|234|235|240|241|242|250|251|252|254|255)" | awk '{print $2 " " $10}')

if [ -z "$smart_attrs" ]; then
send_email "Drive Health Check Failed - No SMART Data" "Could not retrieve SMART data from $parent_device"
exit 1
fi

# Convert to JSON format for Python script
json_data="{\"smart_attrs\":{"
while read -r line; do
if [ ! -z "$line" ]; then
attr_id=$(echo $line | awk '{print $1}')
value=$(echo $line | awk '{print $2}')
json_data="${json_data}\"$attr_id\":$value,"
fi
done <<< "$smart_attrs"
json_data="${json_data%,}}}"

echo "SMART data collected, making health prediction..."

# Call Python script to make prediction
prediction=$(python3 /opt/SimpleSaferServer/scripts/predict_health.py "$json_data")

if [ $? -ne 0 ]; then
send_email "Drive Health Check Failed - Prediction Error" "Error running health prediction script"
exit 1
fi

# Parse prediction result
probability=$(echo $prediction | jq -r '.probability')
prediction_result=$(echo $prediction | jq -r '.prediction')

if [ "$prediction_result" = "1" ]; then
send_email "Drive Health Warning" "Drive health check failed with probability $probability. Drive: $parent_device"
exit 1
else
echo "Drive health check passed with probability $probability"
exit 0
fi No newline at end of file
SCRIPT_DIR="$(cd "$(dirname "$0")" && pwd)"
exec /usr/bin/python3 "$SCRIPT_DIR/check_health.py"
Copy link

Copilot AI Apr 2, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This script hard-codes the Python interpreter path as /usr/bin/python3. On some distros/containers python3 may live elsewhere; using /usr/bin/env python3 (or relying on the shebang in check_health.py) makes the wrapper more portable.

Copilot uses AI. Check for mistakes.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants