Skip to content

fix(amazon-orders): reject auth-wall HTML#1416

Open
bobeglz wants to merge 4 commits into
mvanhorn:mainfrom
bobeglz:fix/amazon-orders-auth-html-guard
Open

fix(amazon-orders): reject auth-wall HTML#1416
bobeglz wants to merge 4 commits into
mvanhorn:mainfrom
bobeglz:fix/amazon-orders-auth-html-guard

Conversation

@bobeglz

@bobeglz bobeglz commented Jun 30, 2026

Copy link
Copy Markdown
Contributor

Summary

Amazon Orders now fails honestly when Amazon returns sign-in, claim, CAPTCHA, or challenge HTML with HTTP 200. Before this, doctor could report a fresh/usable setup while order reads, search, and sync parsed or cached login pages as successful data; now those paths return auth errors and point the operator back to auth login --chrome and doctor.

The fix guards authenticated HTML before parsing, before SQLite write-through, before workflow/archive persistence, and before sync writes. doctor now validates the real order-history surface and flags tainted local cache rows, while local order detail/invoice lookups use the requested order ID instead of the static endpoint path.

Validation

  • go test ./internal/parser ./internal/cli -run 'AuthInterstitial|OrderHistory|ResolveRead|SyncResource|ValidateOrderDetail|DoctorLiveCredential|CollectCacheReport|FetchOrderListPages|TrackRejects|WorkflowArchive|SyncAuthInterstitial'
  • go test ./...
  • go build ./cmd/amazon-orders-pp-cli
  • go vet ./...
  • git diff --check
  • Live smoke on current local Amazon session: doctor --agent --fail-on error reports stale browser proof plus tainted orders/transactions cache; find Aqara, orders get 702-5010515-8774615, and sync --resources orders,transactions --since 90d --strict all fail with explicit auth/interstitial errors instead of returning false data.

Compound Engineering
Codex

@greptile-apps

greptile-apps Bot commented Jun 30, 2026

Copy link
Copy Markdown
Contributor

Greptile Summary

This PR adds an authenticated-HTML guard to the Amazon Orders CLI to prevent Amazon's sign-in, CAPTCHA, and challenge pages (returned with HTTP 200) from being parsed or cached as valid order data. The fix introduces a typed authInterstitialPageError and a new DetectOrderHistoryPage function that is used to validate every live fetch before parsing or SQLite write-through.

  • A new validateBrowserSession helper replaces the inline body-check in validateAndWriteBrowserSessionProof and is reused by doctorLiveCredentialStatus, which now forces NoCache and performs a live authenticated-content probe.
  • All read paths (resolveRead, resolvePaginatedRead, fetchOrderListPages, syncResource, channel_workflow) are guarded with AuthInterstitialError before parsing or persistence; local reads also skip or reject tainted SQLite rows written by older binaries.
  • resolveLocal gains an explicit localID parameter, fixing the previous bug where order-detail and invoice lookups keyed off the static path segment instead of the request's orderID; orders list, transactions, and gift_cards are corrected to use list semantics (isList: true) in the local store.

Confidence Score: 5/5

Safe to merge; the guard is applied consistently across all authenticated HTML read paths and the critical test suite covers the main failure modes addressed by this PR.

The changes are well-tested with targeted fixtures for sign-in HTML in multiple locales, mismatched order IDs, local cache taint detection, and doctor live-probe scenarios. The core logic — a typed error, a shared classifier, and a writeThroughCacheValidated split — is straightforward and the guard is applied at every entry point before parse or persistence.

No files require special attention; all core paths are guarded and tested.

Important Files Changed

Filename Overview
library/commerce/amazon-orders/internal/parser/auth_interstitial.go Adds typed authInterstitialPageError, IsAuthInterstitialError, and DetectOrderHistoryPage; expands locale title coverage to DE/FR/IT/PT/JA/ZH for both sign-in detection and order-history validation.
library/commerce/amazon-orders/internal/cli/data_source.go Adds localID parameter to resolveRead/resolveLocal, introduces authenticatedGetWithHeaders/authenticatedPaginatedGet (shallow-copy NoCache enforcement), and guards auto/live paths and list local reads against interstitial rows.
library/commerce/amazon-orders/internal/cli/auth.go Extracts validateBrowserSession helper that checks interstitial and DetectOrderHistoryPage before writing the session proof; validateAndWriteBrowserSessionProof and doctorLiveCredentialStatus share this helper.
library/commerce/amazon-orders/internal/cli/doctor.go Adds doctorLiveCredentialStatus (no-cache probe via copy) gated behind a valid proof check; collectCacheReport now scans FTS for tainted interstitial rows and marks cache status as error when found.
library/commerce/amazon-orders/internal/cli/sync.go Adds interstitial guard in syncResource immediately after c.Get; auth interstitial errors are now counted as critical failures regardless of the criticalResources map.
library/commerce/amazon-orders/internal/cli/novel_helpers.go Switches to authenticatedGet (NoCache) and changes mid-walk interstitial from a soft stop (return partial results) to a hard error on any page, preventing incomplete scans from appearing successful.
library/commerce/amazon-orders/internal/cli/channel_workflow.go Adds an interstitial guard before JSON parsing in the workflow archive loop, but still calls c.Get() directly rather than authenticatedGet(), so caching behavior is inconsistent with other guarded paths.
library/commerce/amazon-orders/internal/cli/orders_get.go Passes args[0] as localID to resolveRead and adds validateOrderDetailMatchesRequestedID post-parse check; also adds validatePageContainsRequestedOrderID helper used by the invoice command.
library/commerce/amazon-orders/internal/cli/orders_invoice.go Passes args[0] as localID to resolveRead and adds validatePageContainsRequestedOrderID check before HTML extraction.
library/commerce/amazon-orders/internal/cli/orders_list.go Corrects isList from false to true, fixing local reads to use db.List instead of db.Get(orders, orders).
library/commerce/amazon-orders/internal/cli/promoted_transactions.go Corrects isList to true so local reads use list semantics rather than a get-by-path-segment lookup that would always miss.

Sequence Diagram

%%{init: {'theme': 'neutral'}}%%
sequenceDiagram
    participant User
    participant CLI as CLI Command
    participant DS as resolveRead / fetchOrderListPages
    participant HTTP as authenticatedGet (NoCache copy)
    participant Amazon
    participant Guard as AuthInterstitialError
    participant Store as SQLite Store

    User->>CLI: orders get / sync / doctor
    CLI->>DS: dispatch request
    DS->>HTTP: "GET (NoCache=true shallow copy)"
    HTTP->>Amazon: HTTPS request (no cache)
    Amazon-->>HTTP: HTTP 200 + HTML (may be sign-in page)
    HTTP-->>DS: raw bytes

    DS->>Guard: AuthInterstitialError(data)
    alt Interstitial page detected
        Guard-->>DS: authInterstitialPageError
        DS-->>CLI: error (exit 4)
        CLI-->>User: auth error + remediation hint
    else Not interstitial
        Guard-->>DS: nil
        DS->>Store: writeThroughCacheValidated (validated only)
        DS-->>CLI: parsed data + provenance
        CLI-->>User: results
    end

    Note over DS,Store: Local reads also check each stored row with AuthInterstitialError before serving cached data
Loading
%%{init: {'theme': 'base', 'themeVariables': {"darkMode": true, "background": "#0d1117", "primaryColor": "#21262d", "primaryTextColor": "#e6edf3", "primaryBorderColor": "#8b949e", "lineColor": "#8b949e", "textColor": "#e6edf3", "edgeLabelBackground": "#161b22", "actorBkg": "#21262d", "actorBorder": "#8b949e", "actorTextColor": "#e6edf3", "actorLineColor": "#8b949e", "signalColor": "#8b949e", "signalTextColor": "#e6edf3", "noteBkgColor": "#373320", "noteBorderColor": "#d4a72c", "noteTextColor": "#f0e6c0", "labelBoxBkgColor": "#21262d", "labelBoxBorderColor": "#8b949e", "labelTextColor": "#e6edf3", "loopTextColor": "#e6edf3", "activationBkgColor": "#30363d", "activationBorderColor": "#8b949e"}}}%%
sequenceDiagram
    participant User
    participant CLI as CLI Command
    participant DS as resolveRead / fetchOrderListPages
    participant HTTP as authenticatedGet (NoCache copy)
    participant Amazon
    participant Guard as AuthInterstitialError
    participant Store as SQLite Store

    User->>CLI: orders get / sync / doctor
    CLI->>DS: dispatch request
    DS->>HTTP: "GET (NoCache=true shallow copy)"
    HTTP->>Amazon: HTTPS request (no cache)
    Amazon-->>HTTP: HTTP 200 + HTML (may be sign-in page)
    HTTP-->>DS: raw bytes

    DS->>Guard: AuthInterstitialError(data)
    alt Interstitial page detected
        Guard-->>DS: authInterstitialPageError
        DS-->>CLI: error (exit 4)
        CLI-->>User: auth error + remediation hint
    else Not interstitial
        Guard-->>DS: nil
        DS->>Store: writeThroughCacheValidated (validated only)
        DS-->>CLI: parsed data + provenance
        CLI-->>User: results
    end

    Note over DS,Store: Local reads also check each stored row with AuthInterstitialError before serving cached data
Loading

Reviews (4): Last reviewed commit: "fix(amazon-orders): recognize localized ..." | Re-trigger Greptile

Comment thread library/commerce/amazon-orders/internal/cli/doctor.go Outdated
Comment thread library/commerce/amazon-orders/internal/cli/data_source.go Outdated
Comment thread library/commerce/amazon-orders/internal/cli/orders_invoice.go
Comment thread library/commerce/amazon-orders/internal/cli/doctor.go Outdated
Comment thread library/commerce/amazon-orders/internal/parser/auth_interstitial.go
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant