[codex] Add PresentationDocument typed representation#1023
Conversation
Business rationale:\nPPTX files are a core business document format, and Osaurus needs a typed presentation representation before later high-fidelity conversion, analysis, and tool workflows can preserve slide structure and speaker notes. This slice gives downstream document flows a narrow, traceable presentation model instead of relying only on plain text.\n\nCoding rationale:\nAdd Sendable value types for a presentation read model with source provenance markers, plus a read-only PPTX adapter that enforces the existing size-limit API and extracts only slide text and speaker notes from selected OpenXML zip entries. Tests generate tiny PPTX fixtures at runtime to avoid binary fixture churn and cover canHandle, text, notes, size limits, and corrupt zip failures.
|
CI note: current test-core red is the shared ModelPickerItemCache / RemoteProviderManagerRefreshTests singleton race, not this branch's implementation. PPTXAdapter/PresentationDocument tests are passing in the failed run. The race is isolated and fixed in #1025; after that lands, this branch should just need a rebase/rerun. |
|
@tpae PR-specific CI note / next step: This branch is failing Required next step: merge the green base fix in #1025 first, then I will rebase/rerun this PR and update its draft/readiness status once CI is meaningful again. Why: this PR should not be debugged as a presentation-file failure until the known red base is fixed. |
Business rationale
PPTX files are a core business-document format for strategy, finance, sales, and planning workflows. Osaurus needs a typed presentation representation before later high-fidelity conversion, analysis, and agent tooling can preserve slide structure and speaker notes instead of flattening decks into undifferentiated text.
Coding rationale
This adds a narrow Sendable value model for presentation documents plus a read-only PPTX adapter that extracts slide text and speaker notes from selected OpenXML zip entries. The adapter stays inside the existing
DocumentFormatAdapterAPI, honors size limits, and returns aStructuredDocumentwith both typed representation and plain-text fallback.What changed
PresentationDocumentvalue types with slide, element, speaker-note, theme, andSourceProvenancemarkers.PPTXAdapterfor.pptx/PPTX UTI handling, selected zip-entry reads, OpenXML text extraction, size-limit checks, and corrupt zip/read error handling.Validation
git fetch origin && git rebase origin/mainswift build --package-path Packages/OsaurusCoreswift build -c release --package-path Packages/OsaurusCoreswift test --package-path Packages/OsaurusCorefailed twice in unrelatedModelPickerItemCacheTests.notificationBurst_doesNotTransientlyEmptyItems(final.countdropped during full-suite parallel run). The same test passes in isolation withswift test --package-path Packages/OsaurusCore --filter ModelPickerItemCacheTests/notificationBurst_doesNotTransientlyEmptyItems.swift test --package-path Packages/OsaurusCore --filter PPTXAdapterxcrun swift-format lint --strict Packages/OsaurusCore/Models/Documents/PresentationDocument.swift Packages/OsaurusCore/Services/Documents/PPTXAdapter.swift Packages/OsaurusCore/Tests/Documents/PPTXAdapterTests.swiftswiftlint --strict --config .swiftlint.yml --no-cache --use-script-input-filesfor touched non-test Swift files; repo SwiftLint config excludesPackages/OsaurusCore/Tests, soPPTXAdapterTests.swiftwas also checked with a temporary no-exclude config matching the repo's disabled rules.git diff --check && git diff --check origin/main...HEADNon-scope
DocumentAdaptersBootstrap.registerBuiltIns()is not present on currentorigin/main; this PR avoids inventing a parallel registry.Residual risks
notesSlideN.xmlnumbering rather than slide relationship traversal.ModelPickerItemCacheTestsrace that reproduces only in the full parallel suite and passes in isolation.