Skip to content

feat: structured message parsing (parseaple) on inbound events#44

Closed
VanditKumar (KumarVandit) wants to merge 2 commits into
photon-hq:mainfrom
KumarVandit:feat/parseaple
Closed

feat: structured message parsing (parseaple) on inbound events#44
VanditKumar (KumarVandit) wants to merge 2 commits into
photon-hq:mainfrom
KumarVandit:feat/parseaple

Conversation

@KumarVandit
Copy link
Copy Markdown
Member

@KumarVandit VanditKumar (KumarVandit) commented Apr 5, 2026

Adds a typed parse layer for MessageResponse: parse() in lib/parseaple/, optional msg.parsed on incoming socket events (new-message, updated-message, message-updated) and on missed-message recovery after reconnect. Parse runs in try/catch so it can't break delivery. parse / classify / describe / parseVCard are exported from the package entry for REST payloads too.

The problem: every developer building on the SDK ends up writing the same branching logic — checking balloonBundleId, attachment mime types, associatedMessageType, payload blobs, itemType, etc. just to figure out "is this a reaction or an edit or a contact card." That's hours of work per integration, and agents consuming these events get a wall of unstructured fields that burn tokens and lack context.

Before:

sdk.on("new-message", (msg) => {
  if (msg.associatedMessageType != null && msg.associatedMessageType !== 0) {
    // probably a reaction? maybe. check the numeric code...
    const reactionMap = { 2000: "love", 2001: "like", /* ... */ };
  }
  if (msg.balloonBundleId === "com.apple.messages.URLBalloonProvider") {
    // rich link, dig into payloadData...
  }
  // 50 more lines of this
});

After:

sdk.on("new-message", (msg) => {
  const p = msg.parsed;
  if (!p) return;

  switch (p.type) {
    case "reaction":
      console.log(`${p.reaction} on ${p.targetMessageGuid}`);
      break;
    case "rich-link":
      console.log(`Link: ${p.url}${p.title}`);
      break;
    case "contact":
      console.log(`${p.fullName}, ${p.phones.join(", ")}`);
      break;
    case "edit":
      console.log(`Edited from "${p.originalText}" to "${p.newText}"`);
      break;
  }
});

// or just get a one-liner summary
const info = describe(msg.parsed);
// → { summary: "Reacted with love on MSG-GUID", type: "reaction", data: { ... } }

Agents get structured data directly — type, fields, human-readable summary — instead of reverse-engineering Apple's wire format every time.

What changed:

  • lib/parseaple/* — parse, classify, describe, vcard, types
  • client.ts — attaches msg.parsed on socket events + reconnect recovery
  • index.ts — exports the new APIs
  • types/message.tsparsed?: ParsedMessage on MessageResponse

Tested against live traffic — text, replies, edits, reactions, contact cards, polls, rich links, attachments, location sharing, unsend. Reply detection distinguishes real inline replies from server-side chaining. Edit parsing pulls original text. Contact parsing extracts full vCard fields when attachment data is available.

Summary by CodeRabbit

  • New Features
    • Added automatic enrichment of messages with structured parsed data
    • Introduced APIs for message type classification and description generation
    • Extended message parsing to support multiple content types including text, media, contacts, locations, reactions, and system messages
    • Integrated parsed message recovery during connection restoration

- Add lib/parseaple with parse/classify/describe and vCard helpers
- Populate MessageResponse.parsed for new-message, updated-message, and recovery
- Re-export parsing APIs from the package entrypoint
@KumarVandit
Copy link
Copy Markdown
Member Author

VanditKumar (KumarVandit) commented Apr 5, 2026

Reactions

Before — you decode associatedMessageType and free-text yourself (numbers mean add vs remove, classic tapback vs emoji, etc.):

const msg = {
  guid: "550e8400-e29b-41d4-a716-446655440000",
  associatedMessageGuid: "p:0/aaaaaaaa-bbbb-cccc-dddd-eeeeeeeeeeee",
  associatedMessageType: "2003",
  text: "‼️ Laughed at \"meet at 5\"",
};
const target = msg.associatedMessageGuid;
const code = Number(msg.associatedMessageType);
const removing = code >= 3000 && code < 3006;
// still need a table from 2000/3000 ranges to "love" | "laugh" | sticker | emoji…

After:

if (msg.parsed?.type === "reaction") {
  const { targetMessageGuid, reaction, emoji, isRemoval } = msg.parsed;
  // e.g. reaction: "laugh", isRemoval: false, targetMessageGuid: "p:0/aaaa…"
}

Edits

Before:

if (msg.dateEdited) {
  const body = msg.text ?? "";
  // no stable split of "new text" vs metadata without your own history
}

After:

if (msg.parsed?.type === "edit") {
  const { newText, editedAt } = msg.parsed;
}

Polls

Before:

if (msg.isPoll) {
  const line = msg.text ?? "";
  // options often live under payloadData / balloons, not one field
}

After:

if (msg.parsed?.type === "poll") {
  const { title, options, isPollVote } = msg.parsed;
}

Routing

Before:

const bid = msg.balloonBundleId ?? "";
if (msg.associatedMessageGuid && msg.associatedMessageType) { /* reaction */ }
else if (msg.isPoll) { /* poll */ }
else if (bid.includes("com.apple")) { /* balloon app */ }
else if (msg.attachments?.length) { /* media */ }

After:

switch (msg.parsed?.type) {
  case "reaction":
  case "poll":
  case "image":
  case "text":
  case "unknown":
    break;
}

REST

Before: same branching on each row from getMessages().

After: parse(row) (exported from the package root) when you need a ParsedMessage.

Socket

parsed is set on the payload for new-message, updated-message, and message-updated (client.ts, wrapped in try/catch so emit always runs).

@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Apr 5, 2026

📝 Walkthrough

Walkthrough

A new message parsing library (lib/parseaple) is introduced that converts raw message responses into structured, typed data. The library classifies messages, extracts type-specific fields, generates human-readable descriptions, and parses vCard contact data. Client integration automatically enriches received and recovered messages with parsed data.

Changes

Cohort / File(s) Summary
Parseaple Library Core
lib/parseaple/types.ts, lib/parseaple/index.ts
Defines a comprehensive type hierarchy for parsed messages, including base fields and message-type-specific interfaces (text, media, reactions, edits, polls, system, etc.), and re-exports public APIs.
Message Parsing Logic
lib/parseaple/parse.ts
Implements message classification and conversion from MessageResponse to ParsedMessage, with dedicated handlers for each message type including text styling, media metadata, reaction/edit/unsend tracking, and location/collaboration/contact resolution.
Message Description
lib/parseaple/describe.ts
Generates human-readable summaries and extracts type-specific data fields from parsed messages, with formatting helpers for truncation, byte sizes, dimensions, and media metadata.
VCard Parsing
lib/parseaple/vcard.ts
Parses vCard format contact data including standard fields (phones, emails, addresses), Apple extensions (social profiles, labels, dates), and embedded binary data (photos, logos).
Integration Layer
client.ts, index.ts, types/message.ts
Extends MessageResponse with optional parsed field; integrates parse() into socket event handlers and message recovery flow; exposes parseaple APIs in public module exports.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Poem

🐰 A parsing warren's been built with care,
Messages sorted by type throughout the air,
Texts and reactions and links galore,
vCards and chekins and so much more,
Structured and typed, each message shines bright,
The parseaple library makes messaging right! 🌟

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 6.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (2 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title 'feat: structured message parsing (parseaple) on inbound events' accurately and specifically describes the main feature being added—a structured parsing layer for messages on socket inbound events.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Comment @coderabbitai help to get the list of available commands and usage tips.

- replyToGuid only set for true inline replies (threadOriginatorGuid),
  not server-side chaining
- chatGuid cached so updated-message events without chats[] still resolve
- describe() includes reply context, original text for edits, target
  guid for reactions
- edit parse extracts originalText from messageSummaryInfo
- contact parsing expanded to pull all vCard fields (nickname, title,
  urls, addresses, birthday, note) when attachment data is available
- live test example (parseaple-live.ts) for manual smoke testing
@KumarVandit
Copy link
Copy Markdown
Member Author

Before / After by message type


Normal text message

Before:

sdk.on("new-message", (msg) => {
  if (!msg.associatedMessageGuid && !msg.balloonBundleId && !msg.attachments?.length) {
    const text = msg.text ?? "";
    // probably a plain text message... unless it's a system event, poll, etc.
  }
});

After:

if (msg.parsed?.type === "text") {
  const { text, styles, effect } = msg.parsed;
  // text: "hey what's up"
  // styles: [{ start: 0, end: 3, bold: true }]  (if formatted)
  // effect: "fireworks"  (if sent with screen effect)
}

Reply (inline thread)

Before:

// replyToGuid is set on almost every bubble (server chaining), so you can't trust it
// you need to check threadOriginatorGuid + threadOriginatorPart to know if it's a real reply
if (msg.threadOriginatorGuid || msg.threadOriginatorPart) {
  const replyTarget = msg.replyToGuid; // maybe?
}

After:

// replyToGuid is only set on actual inline replies, not server chaining
if (msg.parsed?.replyToGuid) {
  console.log(`This is a reply to ${msg.parsed.replyToGuid}`);
  // threadGuid also available for the thread fork id
}
// works on any message type — text, image, etc.

Image / photo

Before:

const att = msg.attachments?.[0];
if (att?.mimeType?.startsWith("image/") && !att.mimeType.includes("gif")) {
  const width = att.width;
  const height = att.height;
  const size = att.totalBytes;
  // is it a HEIC? PNG? need to check mimeType or uti yourself
}

After:

if (msg.parsed?.type === "image") {
  const { attachment, dimensions, subtype } = msg.parsed;
  // attachment: { guid, mimeType: "image/jpeg", fileName, sizeBytes, uti }
  // dimensions: { width: 3024, height: 4032 }
  // subtype: "jpeg"
}

File (PDF, document, etc.)

Before:

const att = msg.attachments?.[0];
if (att && !att.mimeType?.startsWith("image/") && !att.mimeType?.startsWith("video/") 
    && !att.mimeType?.startsWith("audio/")) {
  // probably a file... but could be a contact card (.vcf) or sticker
  const name = att.transferName;
}

After:

if (msg.parsed?.type === "file") {
  const { attachment } = msg.parsed;
  // attachment: { guid, mimeType: "application/pdf", fileName: "invoice.pdf", sizeBytes: 42000 }
}

Contact card

Before:

const att = msg.attachments?.[0];
if (att?.mimeType === "text/x-vlocation" || att?.transferName?.endsWith(".vcf")) {
  const name = att.transferName?.replace(".vcf", "");
  // phones? emails? you need to fetch the attachment file and parse the vCard yourself
}

After:

if (msg.parsed?.type === "contact") {
  const { fullName, firstName, lastName, phones, emails, org, nickname, 
          title, urls, addresses, birthday, note, attachmentGuid } = msg.parsed;
  // fullName: "John Doe"
  // phones: ["+1 555-1234"]
  // emails: ["john@example.com"]
  // org: "Acme Inc"
  // When attachment.data is present (REST), all vCard fields are extracted.
  // On socket events, use attachmentGuid to fetch full vCard via REST + parseVCard().
}

Reaction (tapback / emoji)

Before:

if (msg.associatedMessageGuid && msg.associatedMessageType) {
  const code = Number(msg.associatedMessageType);
  const isRemoval = code >= 3000;
  const base = isRemoval ? code - 1000 : code;
  const map = { 2000: "love", 2001: "like", 2002: "dislike", 2003: "laugh", 2004: "emphasize", 2005: "question" };
  const reaction = map[base] ?? "unknown";
  // emoji reactions? different code path, parse from text field
  const target = msg.associatedMessageGuid; // "p:0/GUID" — need to split the part index yourself
}

After:

if (msg.parsed?.type === "reaction") {
  const { reaction, emoji, isRemoval, targetMessageGuid, targetPart } = msg.parsed;
  // reaction: "laugh", emoji: undefined (classic) or "🔥" (emoji reaction)
  // isRemoval: false
  // targetMessageGuid: "p:0/AAAA-BBBB-..."
  // targetPart: 0
}

Edit message

Before:

if (msg.dateEdited) {
  const newText = msg.text ?? "";
  // what was the original text? not directly available — you'd need to 
  // track message history yourself or dig into messageSummaryInfo
}

After:

if (msg.parsed?.type === "edit") {
  const { originalText, newText, editedAt } = msg.parsed;
  // originalText: "hello wordl"  (extracted from messageSummaryInfo when available)
  // newText: "hello world"
  // editedAt: 2026-04-06T00:22:55.824Z
}

Location sharing

Before:

if (msg.balloonBundleId === "com.apple.findmy.FindMyMessagingBalloonExtension:FindMyRelay") {
  // dig into payloadData to find coordinates, action, duration...
  const payload = msg.payloadData; // nested structure, varies by action
  // is it a share? request? cancel? check the action string
  // coordinates buried under messageData.data or similar
}

After:

if (msg.parsed?.type === "location-share") {
  const { action, kind, duration, coordinates, address, mapsUrl } = msg.parsed;
  // kind: "share" | "request" | "unknown"
  // duration: "oneHour" | "indefinitely" | "untilEndOfDay"
  // coordinates: { lat: 28.6139, lng: 77.2090 }
  // address: { short: "New Delhi", long: "Connaught Place, New Delhi, India" }
  // mapsUrl: "https://maps.apple.com/?ll=28.6139,77.2090"
}

Check-in (start / end / cancel)

Before:

if (msg.balloonBundleId?.includes("CheckIn")) {
  // payloadData has a deep nested structure with session info
  // mode (timer vs destination), status, estimated end time — all buried in blobs
  // different balloon IDs for different states
}

After:

if (msg.parsed?.type === "checkin") {
  const { mode, status, sessionId, startedAt, estimatedEndTime, 
          destinationName, lowPowerMode, shareUrl } = msg.parsed;
  // mode: "timer" | "destination" | "workout"
  // status: "started" | "ended" | "cancelled"
  // estimatedEndTime: 2026-04-06T02:00:00.000Z
  // destinationName: "Home"  (when mode is "destination")
}

@KumarVandit VanditKumar (KumarVandit) marked this pull request as ready for review April 6, 2026 00:56
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (1)
lib/parseaple/parse.ts (1)

172-187: Module-level cache may accumulate across SDK instances.

The messageGuidToChatGuid cache is module-scoped, meaning it persists across SDK reconnects and is shared if multiple AdvancedIMessageKit instances exist. While the 4096 limit bounds memory, the FIFO eviction may evict recently relevant entries in high-volume scenarios.

Consider whether the cache should be instance-scoped (passed via closure or parameter) if multiple SDK instances are a supported use case. For single-instance usage, this is acceptable.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@lib/parseaple/parse.ts` around lines 172 - 187, The module-scoped cache
messageGuidToChatGuid (and MAX_CHAT_GUID_CACHE) can leak across SDK reconnects
or multiple AdvancedIMessageKit instances; make the cache instance-scoped by
moving it into the AdvancedIMessageKit (or the object that owns resolveChatGuid)
and update resolveChatGuid to use the instance property (or accept the cache as
a parameter) instead of the module Map so each SDK instance has its own bounded
FIFO cache; preserve the eviction logic and MAX_CHAT_GUID_CACHE constant (or
make it an instance config) when moving the storage.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Nitpick comments:
In `@lib/parseaple/parse.ts`:
- Around line 172-187: The module-scoped cache messageGuidToChatGuid (and
MAX_CHAT_GUID_CACHE) can leak across SDK reconnects or multiple
AdvancedIMessageKit instances; make the cache instance-scoped by moving it into
the AdvancedIMessageKit (or the object that owns resolveChatGuid) and update
resolveChatGuid to use the instance property (or accept the cache as a
parameter) instead of the module Map so each SDK instance has its own bounded
FIFO cache; preserve the eviction logic and MAX_CHAT_GUID_CACHE constant (or
make it an instance config) when moving the storage.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: be8ef503-8fac-4591-bbce-3bf0d27d2f18

📥 Commits

Reviewing files that changed from the base of the PR and between b246c46 and 5e9ae41.

📒 Files selected for processing (8)
  • client.ts
  • index.ts
  • lib/parseaple/describe.ts
  • lib/parseaple/index.ts
  • lib/parseaple/parse.ts
  • lib/parseaple/types.ts
  • lib/parseaple/vcard.ts
  • types/message.ts
📜 Review details
🧰 Additional context used
📓 Path-based instructions (4)
**/*.{ts,tsx,js,jsx}

📄 CodeRabbit inference engine (.cursor/rules/use-bun-instead-of-node-vite-npm-pnpm.mdc)

**/*.{ts,tsx,js,jsx}: Use bun <file> instead of node <file> or ts-node <file> for running TypeScript and JavaScript files
Bun automatically loads .env, so don't use dotenv library
Use Bun.serve() with WebSockets, HTTPS, and routes instead of express
Use bun:sqlite for SQLite database operations instead of better-sqlite3
Use Bun.redis for Redis operations instead of ioredis
Use Bun.sql for Postgres database operations instead of pg or postgres.js
Use built-in WebSocket instead of ws library
Prefer Bun.file over node:fs's readFile/writeFile for file operations
Use Bun.$ template tag for shell commands instead of execa library
CSS files can be imported directly in TypeScript/JavaScript/JSX files and will be bundled automatically by Bun

Files:

  • types/message.ts
  • index.ts
  • lib/parseaple/index.ts
  • client.ts
  • lib/parseaple/describe.ts
  • lib/parseaple/parse.ts
  • lib/parseaple/vcard.ts
  • lib/parseaple/types.ts
**/*.{html,ts,tsx,css}

📄 CodeRabbit inference engine (.cursor/rules/use-bun-instead-of-node-vite-npm-pnpm.mdc)

Use bun build <file.html|file.ts|file.css> instead of webpack or esbuild for bundling

Files:

  • types/message.ts
  • index.ts
  • lib/parseaple/index.ts
  • client.ts
  • lib/parseaple/describe.ts
  • lib/parseaple/parse.ts
  • lib/parseaple/vcard.ts
  • lib/parseaple/types.ts
**/*.{html,ts,tsx,js,jsx}

📄 CodeRabbit inference engine (.cursor/rules/use-bun-instead-of-node-vite-npm-pnpm.mdc)

Use HTML imports with Bun.serve() for frontend instead of vite

Files:

  • types/message.ts
  • index.ts
  • lib/parseaple/index.ts
  • client.ts
  • lib/parseaple/describe.ts
  • lib/parseaple/parse.ts
  • lib/parseaple/vcard.ts
  • lib/parseaple/types.ts
**/*.ts

📄 CodeRabbit inference engine (.cursor/rules/use-bun-instead-of-node-vite-npm-pnpm.mdc)

Run TypeScript server files with bun --hot <file.ts> for hot module reloading during development

Files:

  • types/message.ts
  • index.ts
  • lib/parseaple/index.ts
  • client.ts
  • lib/parseaple/describe.ts
  • lib/parseaple/parse.ts
  • lib/parseaple/vcard.ts
  • lib/parseaple/types.ts
🔇 Additional comments (12)
types/message.ts (1)

1-1: LGTM!

The optional parsed?: ParsedMessage field is correctly typed and properly imported. Making it optional aligns with the try/catch pattern in client.ts where parsing failures are swallowed.

Also applies to: 156-156

index.ts (1)

6-7: LGTM!

Clean barrel exports that expose the parsing API for both socket events (automatic enrichment) and REST payloads (manual usage). The export type * ensures only types are re-exported without runtime overhead.

lib/parseaple/index.ts (1)

1-4: LGTM!

Well-organized barrel exports providing a clean public API for the parsing module.

client.ts (2)

259-274: LGTM!

The try/catch pattern ensures parsing failures never block message delivery. The type casts are necessary due to the generic args handling and are guarded by the typeof data === "object" check.


369-374: LGTM!

Consistent enrichment pattern for recovered messages, maintaining the same defensive try/catch approach used in the socket event handler.

lib/parseaple/describe.ts (1)

42-114: LGTM!

The summarize function provides exhaustive coverage of all ParsedMessage types with well-formatted human-readable summaries. TypeScript's narrowing ensures type safety within each case.

lib/parseaple/parse.ts (2)

580-581: Synchronous zlib decompression is acceptable here.

inflateRawSync blocks the event loop but is wrapped in try/catch and location-share messages are infrequent. If profiling shows this becoming a bottleneck, consider making parse() async and using inflateRaw instead.


756-800: LGTM!

The parse function provides clean routing to type-specific handlers with an exhaustive switch. The design ensures all MessageType values are handled, and TypeScript will catch any missing cases if the union is extended.

lib/parseaple/vcard.ts (2)

49-211: LGTM!

Comprehensive vCard parser covering standard fields and Apple extensions. The line unfolding, field splitting, and label application logic correctly handle the vCard format quirks.


255-265: LGTM!

Magic byte detection provides reasonable MIME type inference when the vCard doesn't specify the image type. The fallback to "image/unknown" is appropriate for unrecognized formats.

lib/parseaple/types.ts (2)

1-13: LGTM!

Well-documented ParsedBase interface with clear distinction between replyToGuid (real inline replies) and threadGuid (thread fork identifier). The discriminated union pattern via the type field is correctly set up.


188-207: LGTM!

The ParsedMessage discriminated union covers all 18 message types with proper type discrimination via the type literal fields. This enables exhaustive pattern matching in consumers.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant