Skip to content

Conversation

kprocks86
Copy link

@kprocks86 kprocks86 commented Aug 26, 2025

This PR is a Fix of issue #1839

Where i found out that the logic which is used to validate the documentUploadSchema is bit faulty.
issues

  1. VERCEL_BLOB_HOST this variable is not present in the .env.example
  2. hostname returns the whole host example123.public.blob.vercel-storage.com and previouly was matching with startWith as you can see from the host includes is the best way to match that condition.
  3. in storageType the same logic is needed as for the url so made a new function and used at both the places.

Summary by CodeRabbit

  • New Features

    • Dub discount integration applied at checkout for eligible users.
    • Configurable Vercel Blob storage host via new environment variable.
  • Bug Fixes

    • Improved Notion URL handling (official, .notion.site and custom domains) to reduce false rejections when adding/updating Notion documents.
    • More reliable validation for uploads and updates, reducing user-facing errors.
  • Chores

    • Updated .env.example to include the Vercel Blob host variable.

@kprocks86 kprocks86 requested a review from mfts as a code owner August 26, 2025 15:02
Copy link
Contributor

coderabbitai bot commented Aug 26, 2025

Walkthrough

Centralizes and extends URL/hostname validation (Notion official, .notion.site, custom domains, Vercel blob host via VERCEL_BLOB_HOST or vercel-storage.com) into lib/zod/url-validation.ts, adds Notion custom-domain page-id extraction/verification, introduces getDubDiscountForExternalUserId and applies it in billing checkout, and updates various endpoints to use async Zod validation.

Changes

Cohort / File(s) Summary
Environment variables
\.env.example
Added VERCEL_BLOB_HOST under Vercel Storage Blob section with default value vercel-storage.com.
URL validation & Zod schemas
lib/zod/url-validation.ts
Added validateUrlHostname(hostname) to centralize hostname checks (Notion official, .notion.site, Vercel blob via env fallback). Made Notion URL validation async, added and exported notionUrlUpdateSchema, and updated documentUpload logic to handle custom Notion domains and VERCEL_BLOB handling.
Notion utilities
lib/notion/utils.ts
Added extractPageIdFromCustomNotionUrl(url) and isCustomNotionDomain(url); extended getNotionPageIdFromSlug(url) to support official, .notion.site, and custom domains with existence checks.
Document UI & processing
components/documents/add-document-modal.tsx, lib/api/documents/process-document.ts
Renamed and refactored Notion page-id handling: store parsePageId result, fallback to getNotionPageIdFromSlug, ensure pageId is string
API: document endpoints
pages/api/teams/[teamId]/documents/[id]/update-notion-url.ts, pages/api/teams/[teamId]/documents/versions/index.ts, pages/api/teams/[teamId]/documents/agreement.ts, pages/api/teams/[teamId]/documents/index.ts
Switched Zod validation from sync safeParse to async safeParseAsync and adopted notionUrlUpdateSchema in update-notion-url endpoint, replacing manual URL validation with schema-driven checks. Other endpoints now await async validation.
Dub integration & billing
lib/dub.ts, pages/api/teams/[teamId]/billing/upgrade.ts
New export getDubDiscountForExternalUserId(externalId) queries Dub API and returns coupon discount if present; billing upgrade imports and applies the Dub discount to Stripe Checkout payloads and consolidates analytics/identify waitUntil into Promise.all.

Sequence Diagram(s)

%% Notion URL validation and page-id resolution (high-level)
sequenceDiagram
  autonumber
  actor Client
  participant API as Server Endpoint
  participant Zod as url-validation
  participant NotionUtils as lib/notion/utils
  participant Env as Env (VERCEL_BLOB_HOST)

  Client->>API: submit notion URL
  API->>Zod: notionUrlUpdateSchema.safeParseAsync(url)
  activate Zod
  Zod->>Env: read VERCEL_BLOB_HOST
  Env-->>Zod: host or undefined
  note right of Zod #eef2ff: validateUrlHostname checks\n- official Notion hostnames\n- endsWith(".notion.site")\n- Vercel blob host via env/fallback
  alt hostname is standard Notion
    Zod-->>API: valid (may parse pageId via parsePageId)
  else custom or non-standard hostname
    Zod->>NotionUtils: getNotionPageIdFromSlug(url)
    NotionUtils-->>Zod: pageId or null
    alt pageId found
      Zod-->>API: valid
    else
      Zod-->>API: invalid (400)
    end
  end
  deactivate Zod
Loading
%% Billing upgrade: applying Dub discount (high-level)
sequenceDiagram
  autonumber
  actor User
  participant Server as billing/upgrade API
  participant Dub as lib/dub.getDubDiscountForExternalUserId
  participant Stripe as Stripe Checkout

  User->>Server: request upgrade
  Server->>Dub: getDubDiscountForExternalUserId(user.externalId)
  Dub-->>Server: dubDiscount or null
  Server->>Stripe: createCheckoutSession(payload + dubDiscount?)
  Stripe-->>Server: session created
  Server-->>User: redirect/checkout URL
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Possibly related PRs

Tip

🔌 Remote MCP (Model Context Protocol) integration is now available!

Pro plan users can now connect to remote MCP servers from the Integrations page. Connect with popular remote MCPs such as Notion and Linear to add more context to your reviews and chats.

✨ Finishing Touches
  • 📝 Generate Docstrings
🧪 Generate unit tests
  • Create PR with unit tests
  • Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share
🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>, please review it.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.

Support

Need help? Create a ticket on our support page for assistance with any issues or questions.

CodeRabbit Commands (Invoked using PR/Issue comments)

Type @coderabbitai help to get the list of available commands.

Other keywords and placeholders

  • Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (.coderabbit.yaml)

  • You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
  • Please see the configuration documentation for more information.
  • If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Status, Documentation and Community

  • Visit our Status Page to check the current availability of CodeRabbit.
  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

Copy link

github-actions bot commented Aug 26, 2025

All contributors have signed the CLA ✍️ ✅
Posted by the CLA Assistant Lite bot.

Copy link

vercel bot commented Aug 26, 2025

@kprocks86 is attempting to deploy a commit to the mftsio Team on Vercel.

A member of the Team first needs to authorize it.

@kprocks86
Copy link
Author

I have read the CLA Document and I hereby sign the CLA

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (6)
.env.example (1)

14-14: Clarify what value VERCEL_BLOB_HOST expects (base domain suffix, no protocol).

Avoid confusion between full host and base domain. This also aligns with the validation logic that matches by domain suffix.

-VERCEL_BLOB_HOST=vercel-storage.com
+# Vercel Storage Blob base host used for validation (domain suffix only)
+# Example: vercel-storage.com  (do not include protocol, path, or a leading dot)
+VERCEL_BLOB_HOST=vercel-storage.com
lib/zod/url-validation.ts (5)

107-116: Update comment and error message to reflect that Vercel Blob HTTPS URLs are now accepted by filePathSchema.

The implementation now allows Vercel Blob hosts via validateUrlHostname, but the comment and error still imply Notion-only.

-        // Case 1: Notion URLs - must start with notion domains
+        // Case 1: Hosted URLs - must be Notion or Vercel Blob hosts
@@
-        message:
-          "File path must be either a Notion URL or an file storage path",
+        message:
+          "File path must be a Notion/Vercel Blob URL or a storage path",

Also applies to: 124-126


206-225: Deduplicate Notion-host checks; consider a small helper for consistency.

The Notion hostname logic is duplicated here instead of reusing a single helper. Extracting isNotionHostname(h: string) reduces drift and keeps behavior aligned if domains change.

Example helper to place above:

const isNotionHostname = (h: string): boolean => {
  const host = h.toLowerCase();
  return (
    host === "www.notion.so" ||
    host === "notion.so" ||
    host.endsWith(".notion.site")
  );
};

Then use:

-            const hostname = urlObj.hostname;
-            return (
-              hostname === "www.notion.so" ||
-              hostname === "notion.so" ||
-              hostname.endsWith(".notion.site")
-            );
+            return isNotionHostname(urlObj.hostname);

231-241: Comment no longer matches behavior.

This branch now accepts Vercel Blob (and Notion) hosts via validateUrlHostname, but the comment says “Must be a Notion URL”.

-          // Must be a Notion URL for VERCEL_BLOB
+          // Must be a Notion or Vercel Blob URL for VERCEL_BLOB

118-122: Avoid repeating the S3 path regex; extract a shared constant.

The same strict pattern appears in two places; centralize to reduce maintenance risk.

-        const s3PathPattern =
-          /^[a-zA-Z0-9_-]+\/doc_[a-zA-Z0-9_-]+\/[a-zA-Z0-9_.-]+\.[a-zA-Z0-9]+$/;
+        const s3PathPattern = S3_PATH_PATTERN;

Add near the top of the file:

const S3_PATH_PATTERN =
  /^[a-zA-Z0-9_-]+\/doc_[a-zA-Z0-9_-]+\/[a-zA-Z0-9_.-]+\.[a-zA-Z0-9]+$/;

And here:

-        return /^[a-zA-Z0-9_-]+\/doc_[a-zA-Z0-9_-]+\/[a-zA-Z0-9_.-]+\.[a-zA-Z0-9]+$/.test(
-          data.url,
-        );
+        return S3_PATH_PATTERN.test(data.url);

Also applies to: 243-245


79-99: Add tests for hostname validation edge cases.

Target cases that would pass with includes but must be rejected with suffix checks.

Suggested cases:

  • Accept: example.public.blob.vercel-storage.com
  • Accept: public.blob.vercel-storage.com
  • Accept when env set to blob.vercel-storage.com: foo.blob.vercel-storage.com
  • Reject: vercel-storage.com.attacker.tld
  • Reject: notvercel-storage.com
  • Case-insensitivity: VERCEL-STORAGE.COM and MixedCase hosts
  • Notion: www.notion.so, notion.so, foo.notion.site

I can open a follow-up PR with these unit tests if helpful.

Also applies to: 112-116, 231-241

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

💡 Knowledge Base configuration:

  • MCP integration is disabled by default for public repositories
  • Jira integration is disabled
  • Linear integration is disabled

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between d39a267 and 48a2501.

📒 Files selected for processing (2)
  • .env.example (1 hunks)
  • lib/zod/url-validation.ts (3 hunks)

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

♻️ Duplicate comments (1)
lib/zod/url-validation.ts (1)

89-104: Good fix: replaced substring matching with exact/suffix allowlist for Vercel Blob

Switching from includes(...) to exact match or dot-suffix checks closes the allowlist bypass (e.g., vercel-storage.com.attacker.tld). Nice improvement.

🧹 Nitpick comments (1)
lib/zod/url-validation.ts (1)

121-121: createFilePathValidator now accepts Vercel Blob URLs; update the stale comment and confirm intent

This now returns validateUrlHostname(hostname), which also allows Vercel Blob hosts, not just Notion. That’s likely intended, but the nearby comment “Case 1: Notion URLs - must start with notion domains” is now misleading, and the error message below still says “either a Notion URL or a file storage path.”

  • Please update the comment to reflect “trusted HTTPS URLs (Notion or Vercel Blob)”.
  • Consider aligning the error message accordingly.
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

💡 Knowledge Base configuration:

  • MCP integration is disabled by default for public repositories
  • Jira integration is disabled
  • Linear integration is disabled

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between 48a2501 and afb4571.

📒 Files selected for processing (1)
  • lib/zod/url-validation.ts (3 hunks)
🔇 Additional comments (1)
lib/zod/url-validation.ts (1)

246-246: Consistent reuse of hostname helper in storageType check

Good reuse of validateUrlHostname here; keeps the allowlist logic centralized and consistent.

Comment on lines +79 to +107
// Helper function to validate URL hostnames for different storage types
const validateUrlHostname = (hostname: string): boolean => {
// Valid notion domains
const validNotionDomains = ["www.notion.so", "notion.so"];

// Check for notion.site subdomains (e.g., example-something.notion.site)
const isNotionSite = hostname.endsWith(".notion.site");
const isValidNotionDomain = validNotionDomains.includes(hostname);

// Check for vercel blob storage
let isVercelBlob = false;

const normalizedHostname = hostname.toLowerCase().trim();

if (process.env.VERCEL_BLOB_HOST) {

const normalizedBlobHost = process.env.VERCEL_BLOB_HOST.toLowerCase().trim();
// Use exact match or suffix-with-dot to prevent bypasses
isVercelBlob = normalizedHostname === normalizedBlobHost ||
normalizedHostname.endsWith("." + normalizedBlobHost);
} else {
// Fallback: check for common Vercel Blob patterns if env var is not set
// Use exact match or suffix-with-dot to prevent bypasses
isVercelBlob = normalizedHostname === "vercel-storage.com" ||
normalizedHostname.endsWith(".vercel-storage.com");
}

return isNotionSite || isValidNotionDomain || isVercelBlob;
};
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Lowercase and normalize hostnames for all checks; fix case-sensitivity and trailing-dot edge cases

Notion checks use the raw hostname while Vercel uses a normalized one. Hostnames are case-insensitive, and trailing dots are valid. Today, a URL like https://WWW.NOTION.SO/ or https://page.notion.site./ will be rejected, while https://foo.vercel-storage.com./ won’t pass either due to the trailing dot. Normalize once and reuse the same variable across all checks.

Apply this diff to make the checks robust and consistent:

-// Helper function to validate URL hostnames for different storage types
-const validateUrlHostname = (hostname: string): boolean => {
-  // Valid notion domains
-  const validNotionDomains = ["www.notion.so", "notion.so"];
-
-  // Check for notion.site subdomains (e.g., example-something.notion.site)
-  const isNotionSite = hostname.endsWith(".notion.site");
-  const isValidNotionDomain = validNotionDomains.includes(hostname);
-
-  // Check for vercel blob storage
-  let isVercelBlob = false;
-
-  const normalizedHostname = hostname.toLowerCase().trim();
-
-  if (process.env.VERCEL_BLOB_HOST) {
-
-    const normalizedBlobHost = process.env.VERCEL_BLOB_HOST.toLowerCase().trim();
-    // Use exact match or suffix-with-dot to prevent bypasses
-    isVercelBlob = normalizedHostname === normalizedBlobHost ||
-      normalizedHostname.endsWith("." + normalizedBlobHost);
-  } else {
-    // Fallback: check for common Vercel Blob patterns if env var is not set
-    // Use exact match or suffix-with-dot to prevent bypasses
-    isVercelBlob = normalizedHostname === "vercel-storage.com" ||
-      normalizedHostname.endsWith(".vercel-storage.com");
-  }
-
-  return isNotionSite || isValidNotionDomain || isVercelBlob;
-};
+// Helper function to validate URL hostnames for different storage types
+const validateUrlHostname = (hostname: string): boolean => {
+  // Normalize once: lowercase, trim, and strip leading/trailing dots
+  const h = hostname.toLowerCase().trim().replace(/^\.+|\.+$/g, "");
+
+  // Valid Notion domains
+  const validNotionDomains = ["www.notion.so", "notion.so"];
+
+  // Check for notion.site subdomains (e.g., example-something.notion.site)
+  const isNotionSite = h.endsWith(".notion.site");
+  const isValidNotionDomain = validNotionDomains.includes(h);
+
+  // Check for Vercel Blob storage by exact-or-suffix match
+  const base = (process.env.VERCEL_BLOB_HOST || "vercel-storage.com")
+    .toLowerCase()
+    .trim()
+    .replace(/^\.+|\.+$/g, "");
+  const isVercelBlob = h === base || h.endsWith(`.${base}`);
+
+  return isNotionSite || isValidNotionDomain || isVercelBlob;
+};

If you want, I can add unit tests covering:

  • Mixed-case hosts (WWW.NOTION.SO)
  • Trailing-dot hosts (page.notion.site., foo.vercel-storage.com.)
  • Env override (custom base with/without trailing dot)
  • Negative case (vercel-storage.com.attacker.tld)
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
// Helper function to validate URL hostnames for different storage types
const validateUrlHostname = (hostname: string): boolean => {
// Valid notion domains
const validNotionDomains = ["www.notion.so", "notion.so"];
// Check for notion.site subdomains (e.g., example-something.notion.site)
const isNotionSite = hostname.endsWith(".notion.site");
const isValidNotionDomain = validNotionDomains.includes(hostname);
// Check for vercel blob storage
let isVercelBlob = false;
const normalizedHostname = hostname.toLowerCase().trim();
if (process.env.VERCEL_BLOB_HOST) {
const normalizedBlobHost = process.env.VERCEL_BLOB_HOST.toLowerCase().trim();
// Use exact match or suffix-with-dot to prevent bypasses
isVercelBlob = normalizedHostname === normalizedBlobHost ||
normalizedHostname.endsWith("." + normalizedBlobHost);
} else {
// Fallback: check for common Vercel Blob patterns if env var is not set
// Use exact match or suffix-with-dot to prevent bypasses
isVercelBlob = normalizedHostname === "vercel-storage.com" ||
normalizedHostname.endsWith(".vercel-storage.com");
}
return isNotionSite || isValidNotionDomain || isVercelBlob;
};
// Helper function to validate URL hostnames for different storage types
const validateUrlHostname = (hostname: string): boolean => {
// Normalize once: lowercase, trim whitespace, and strip leading/trailing dots
const h = hostname
.toLowerCase()
.trim()
.replace(/^\.+|\.+$/g, "");
// Valid Notion domains
const validNotionDomains = ["www.notion.so", "notion.so"];
// Check for notion.site subdomains (e.g., example-something.notion.site)
const isNotionSite = h.endsWith(".notion.site");
const isValidNotionDomain = validNotionDomains.includes(h);
// Check for Vercel Blob storage by exact-or-suffix match
const base = (process.env.VERCEL_BLOB_HOST || "vercel-storage.com")
.toLowerCase()
.trim()
.replace(/^\.+|\.+$/g, "");
const isVercelBlob = h === base || h.endsWith(`.${base}`);
return isNotionSite || isValidNotionDomain || isVercelBlob;
};
🤖 Prompt for AI Agents
In lib/zod/url-validation.ts around lines 79 to 107, the function currently
mixes raw and normalized hostnames causing case-sensitivity and trailing-dot
issues; normalize the input hostname once (lowercase, trim, and remove any
trailing dot) and use that normalizedHostname for all checks (notion domains,
notion.site suffix, and vercel blob checks); also normalize/process
process.env.VERCEL_BLOB_HOST the same way (lowercase, trim, remove trailing dot)
before comparing and continue to use exact-match or suffix-with-dot comparisons
(e.g., normalizedHostname === normalizedBlobHost ||
normalizedHostname.endsWith("." + normalizedBlobHost)) to avoid bypasses.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
pages/api/teams/[teamId]/billing/upgrade.ts (1)

85-102: Dub discount is not applied for existing Stripe customers.

dubDiscount is only spread into the “new customer” session. Existing customers won’t get the discount, which likely contradicts intent.

Apply this diff to include the discount in the existing-customer flow:

       allow_promotion_codes: true,
-      client_reference_id: teamId,
+      client_reference_id: teamId,
+      ...(dubDiscount ?? {}),
♻️ Duplicate comments (3)
lib/zod/url-validation.ts (3)

135-139: Fix allowlist bypass: endsWith(process.env.VERCEL_BLOB_HOST) accepts sibling/attacker domains.

evilvercel-storage.com passes endsWith("vercel-storage.com"). Use exact-or-dot-bounded suffix with normalization and add a safe fallback when the env is unset. Also, reuse the helper to avoid divergence.

-            // Check for vercel blob storage
-            let isVercelBlob = false;
-            if (process.env.VERCEL_BLOB_HOST) {
-              isVercelBlob = hostname.endsWith(process.env.VERCEL_BLOB_HOST);
-            }
+            // Known hosts (Notion or Vercel Blob) short-circuit
+            const isKnownHost = validateUrlHostname(hostname);
+            const isVercelBlob = isKnownHost; // encompassed by helper

354-371: Repeat of the Vercel Blob allowlist bypass; also missing fallback when env is absent.

This duplicates the earlier issue: endsWith(process.env.VERCEL_BLOB_HOST) is unsafe and fails when the env var isn’t set (even for legitimate *.vercel-storage.com hosts). Reuse the hardened helper.

-            const isStandardNotion =
-              hostname === "www.notion.so" ||
-              hostname === "notion.so" ||
-              hostname.endsWith(".notion.site");
-
-            if (isStandardNotion) {
-              return true;
-            }
-
-            // Check for vercel blob storage
-            let isVercelBlob = false;
-            if (process.env.VERCEL_BLOB_HOST) {
-              isVercelBlob = hostname.endsWith(process.env.VERCEL_BLOB_HOST);
-              if (isVercelBlob) {
-                return true;
-              }
-            }
+            // Known hostnames (Notion or Vercel Blob) are accepted
+            if (validateUrlHostname(hostname)) {
+              return true;
+            }

84-112: Centralize and harden host checks; normalize everything and require dot-bounded suffix.

Great to see a helper; but:

  • Notion checks use raw hostname, causing case/trailing-dot edge cases.
  • Vercel check is correct here (exact or .-bounded suffix), but this helper isn’t reused below, leading to drift.

Normalize once, use it for all families, and then reuse this helper elsewhere in the file.

-const validateUrlHostname = (hostname: string): boolean => {
-  // Valid notion domains
-  const validNotionDomains = ["www.notion.so", "notion.so"];
-
-  // Check for notion.site subdomains (e.g., example-something.notion.site)
-  const isNotionSite = hostname.endsWith(".notion.site");
-  const isValidNotionDomain = validNotionDomains.includes(hostname);
-
-  // Check for vercel blob storage
-  let isVercelBlob = false;
-
-  const normalizedHostname = hostname.toLowerCase().trim();
-
-  if (process.env.VERCEL_BLOB_HOST) {
-
-    const normalizedBlobHost = process.env.VERCEL_BLOB_HOST.toLowerCase().trim();
-    // Use exact match or suffix-with-dot to prevent bypasses
-    isVercelBlob = normalizedHostname === normalizedBlobHost ||
-      normalizedHostname.endsWith("." + normalizedBlobHost);
-  } else {
-    // Fallback: check for common Vercel Blob patterns if env var is not set
-    // Use exact match or suffix-with-dot to prevent bypasses
-    isVercelBlob = normalizedHostname === "vercel-storage.com" ||
-      normalizedHostname.endsWith(".vercel-storage.com");
-  }
-
-  return isNotionSite || isValidNotionDomain || isVercelBlob;
-};
+const validateUrlHostname = (hostname: string): boolean => {
+  // Normalize once: lowercase, trim, and strip leading/trailing dots
+  const h = hostname.toLowerCase().trim().replace(/^\.+|\.+$/g, "");
+
+  // Notion
+  const validNotionDomains = ["www.notion.so", "notion.so"];
+  const isNotionSite = h.endsWith(".notion.site");
+  const isValidNotionDomain = validNotionDomains.includes(h);
+
+  // Vercel Blob: exact or dot-bounded suffix; allow fallback when env is absent
+  const base = (process.env.VERCEL_BLOB_HOST || "vercel-storage.com")
+    .toLowerCase()
+    .trim()
+    .replace(/^\.+|\.+$/g, "");
+  const isVercelBlob = h === base || h.endsWith(`.${base}`);
+
+  return isNotionSite || isValidNotionDomain || isVercelBlob;
+};
🧹 Nitpick comments (18)
lib/dub.ts (2)

6-14: Early-exit when DUB_API_KEY is unset and tolerate SDK list shapes; add a narrow return type.

Avoids unnecessary network calls and noisy logs when the key is missing, and handles both array and paginated shapes ({ data: [...] }) defensively. Also clarifies the return payload shape so it matches the Stripe param we spread into session creation.

@@
-export async function getDubDiscountForExternalUserId(externalId: string) {
+type StripeDiscountsParam = { discounts: { coupon: string }[] };
+
+export async function getDubDiscountForExternalUserId(
+  externalId: string,
+): Promise<StripeDiscountsParam | null> {
+  // Fast-fail if not configured to avoid noisy network errors in non-prod
+  if (!process.env.DUB_API_KEY) {
+    if (process.env.NODE_ENV !== "production") {
+      console.warn("DUB_API_KEY is not set; skipping Dub discount lookup");
+    }
+    return null;
+  }
   try {
-    const customers = await dub.customers.list({
+    const customers = await dub.customers.list({
       externalId,
       includeExpandedFields: true,
     });
-    const first = customers[0];
+    const list = Array.isArray(customers) ? customers : customers?.data ?? [];
+    const first = list[0];

19-23: Narrow the error log to avoid leaking sensitive data; add context.

Some SDK errors can include request details. Log only the message and relevant context to reduce risk.

-  } catch (err) {
-    console.warn("Skipping Dub discount due to API error", err);
+  } catch (err) {
+    console.warn("Skipping Dub discount due to API error", {
+      externalId,
+      message: err instanceof Error ? err.message : String(err),
+    });
pages/api/teams/[teamId]/billing/upgrade.ts (1)

85-102: Mirror metadata for observability parity with the new-customer path.

Optional, but adding metadata.dubCustomerId in the existing-customer branch keeps analytics/debugging consistent.

       allow_promotion_codes: true,
       client_reference_id: teamId,
+      metadata: {
+        dubCustomerId: userId,
+      },
pages/api/teams/[teamId]/documents/index.ts (1)

245-247: Use safeParseAsync: correct. Consider deriving type when absent for robustness

The async validation change is correct. To reduce client coupling, consider deriving type when it’s not provided (similar to agreement route). This avoids 400s if older clients omit type.

Apply:

-    const validationResult = await documentUploadSchema.safeParseAsync(
-      req.body,
-    );
+    const validationResult = await documentUploadSchema.safeParseAsync({
+      ...req.body,
+      // Derive type from URL/filename if client omitted it
+      type: req.body.type ?? (req.body.url ? getExtension(req.body.url) : getExtension(req.body.name)),
+    });

Outside the selected lines, update the import:

-import { log, serializeFileSize } from "@/lib/utils";
+import { log, serializeFileSize, getExtension } from "@/lib/utils";
pages/api/teams/[teamId]/documents/[id]/versions/index.ts (1)

35-38: Async validation: good. Optional: derive type if clients omit it

Switching to safeParseAsync is correct. If some clients don’t send type, consider deriving it from url to keep the endpoint backward compatible.

Apply:

-    const validationResult = await documentUploadSchema.safeParseAsync({
-      ...req.body,
-      name: `Version ${new Date().toISOString()}`, // Dummy name for validation
-    });
+    const validationResult = await documentUploadSchema.safeParseAsync({
+      ...req.body,
+      name: `Version ${new Date().toISOString()}`, // Dummy name for validation
+      type: req.body.type ?? getExtension(req.body.url),
+    });

And add the import outside the selected lines:

-import { log } from "@/lib/utils";
+import { log, getExtension } from "@/lib/utils";
lib/api/documents/process-document.ts (1)

71-72: Fix typo in user-facing error message (“publically” → “publicly”)

This is shown to users and appears in logs. Quick copy edit improves polish.

-      throw new Error("This Notion page isn't publically available.");
+      throw new Error("This Notion page isn't publicly available.");
pages/api/teams/[teamId]/documents/[id]/update-notion-url.ts (1)

28-36: Schema-driven validation with safeParseAsync is the right move; log invalid attempts

Using notionUrlUpdateSchema.safeParseAsync centralizes validation and SSRF checks. Consider logging invalid attempts for observability.

-  const validationResult = await notionUrlUpdateSchema.safeParseAsync(url);
-  if (!validationResult.success) {
-    return res.status(400).json({ message: validationResult.error.message });
-  }
+  const validationResult = await notionUrlUpdateSchema.safeParseAsync(url);
+  if (!validationResult.success) {
+    console.warn(
+      "[update-notion-url] Validation failed",
+      JSON.stringify(validationResult.error.errors ?? validationResult.error.issues ?? validationResult.error),
+    );
+    return res.status(400).json({ message: validationResult.error.message });
+  }
components/documents/add-document-modal.tsx (2)

408-421: Simplify ID derivation; avoid null-vs-undefined churn and double-naming.

Use a single pageId variable, coalesce parsePageId to undefined, and only call the async slug resolver if still falsy. This reduces branching and avoids reassigning undefined explicitly.

-    let validateNotionPageId = parsePageId(notionLink);
+    let pageId = parsePageId(notionLink) ?? undefined;

-    if (validateNotionPageId === null) {
+    if (!pageId) {
       try {
-        const pageId = await getNotionPageIdFromSlug(notionLink);
-        validateNotionPageId = pageId || undefined;
+        pageId = (await getNotionPageIdFromSlug(notionLink)) ?? undefined;
       } catch (slugError) {
         toast.error("Please enter a valid Notion link to proceed.");
         return;
       }
     }

-    if (!validateNotionPageId) {
+    if (!pageId) {
       toast.error("Please enter a valid Notion link to proceed.");
       return;
     }

408-421: Consider reusing the shared Zod notion URL schema client-side.

To keep client and server semantics identical, you can import notionUrlUpdateSchema and call safeParseAsync(notionLink) here, replacing bespoke parsing. This prevents drift as validation evolves.

Would you like me to wire notionUrlUpdateSchema.safeParseAsync into this handler and adjust the UI message handling?

lib/notion/utils.ts (3)

176-183: Remove forbidden User-Agent header for browser compatibility.

Browsers disallow setting the User-Agent header; this will be ignored or error in some environments. It’s unnecessary here.

   const response = await fetch(apiUrl, {
     method: "POST",
     headers: {
       "Content-Type": "application/json",
-      "User-Agent": "Mozilla/5.0 (compatible; MyApp/1.0)",
     },
     body: JSON.stringify(payload),
   });

82-101: Nice fallback path; consider decoding the pathname before regex.

parsePageId already handles most formats. For the regex fallback, decoding %2D-style encodings can help in edge cases.

-    const pathname = urlObj.pathname;
+    const pathname = decodeURIComponent(urlObj.pathname);

107-120: Network errors can yield false negatives for custom domains.

isCustomNotionDomain returns false on any error (including transient network issues), potentially rejecting valid URLs. Consider distinguishing unreachable vs. invalid: treat network timeouts/5xx as “unknown” and let server-side validation decide, or retry once.

I can add a conservative retry with exponential backoff and map 4xx to false, 5xx/network to true-with-warning or “unknown” depending on your preference.

lib/zod/url-validation.ts (6)

140-156: Deduplicate logic: call the helper first, then fall back to custom-domain Notion resolution.

This keeps all standard host logic in one place and reduces the chance of drift.

-            // If it's not a standard Notion domain or Vercel blob, check if it's a custom Notion domain
-            if (!isNotionSite && !isValidNotionDomain && !isVercelBlob) {
+            // If not a known host (Notion/Vercel Blob), try custom Notion domain resolution
+            if (!isKnownHost) {
               try {
                 let pageId = parsePageId(path);
                 if (!pageId) {
                   const pageIdFromSlug = await getNotionPageIdFromSlug(path);
                   if (pageIdFromSlug) {
                     pageId = pageIdFromSlug;
                   }
                 }
                 return !!pageId;
               } catch {
                 return false;
               }
             }
 
-            return isNotionSite || isValidNotionDomain || isVercelBlob;
+            return true;

168-171: Grammar nit: article usage.

“an file storage path” → “a file storage path”.

-          "File path must be either a Notion URL or an file storage path",
+          "File path must be either a Notion URL or a file storage path",

190-192: Scheme check should be case-insensitive and robust.

startsWith("https://") misses HTTPS://… and is bypassable with leading whitespace. Prefer parsing the URL and checking protocol === "https:".

-  .refine((url) => url.startsWith("https://"), {
+  .refine((url) => {
+    try {
+      return new URL(url).protocol === "https:";
+    } catch {
+      return false;
+    }
+  }, {
     message: "Notion URL must use HTTPS",
   })

186-247: Optional: use the helper in notionUrlUpdateSchema too.

Not strictly required, but calling validateUrlHostname(urlObj.hostname) before custom resolution would short-circuit common cases and keep semantics consistent across validators.

I can submit a follow-up patch applying the helper here and add unit tests to lock behavior.


248-394: Message clarity and consistency across branches.

Error messages between validators differ slightly; consider aligning them (e.g., “URL contains invalid characters or targets internal resources”) for a consistent UX.

Happy to unify copy across schemas and add i18n keys if you have a translations layer.


120-171: Add comprehensive domain normalization tests for URL validation

Please add unit tests covering both Notion URL and file–storage path validators to guard against mixed-case, trailing-dot, and malicious domains. Specifically, verify:

  • Mixed-case domains (e.g. https://WWW.NOTION.SO/)
  • Trailing-dot hosts (e.g. https://page.notion.site./abc)
  • Valid Vercel blob hosts (e.g. https://example123.public.blob.vercel-storage.com)
  • Invalid attacker variants (e.g. https://evilvercel-storage.com, https://vercel-storage.com.attacker.tld)
  • Behavior when process.env.VERCEL_BLOB_HOST is absent versus present, and with/without a trailing dot

Apply these tests to both validation blocks (currently at lines 120–171 and lines 312–394 in lib/zod/url-validation.ts).

Let me know if you’d like help scaffolding these tests.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

💡 Knowledge Base configuration:

  • MCP integration is disabled by default for public repositories
  • Jira integration is disabled
  • Linear integration is disabled

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between afb4571 and dc1cbfd.

📒 Files selected for processing (10)
  • components/documents/add-document-modal.tsx (1 hunks)
  • lib/api/documents/process-document.ts (1 hunks)
  • lib/dub.ts (1 hunks)
  • lib/notion/utils.ts (4 hunks)
  • lib/zod/url-validation.ts (6 hunks)
  • pages/api/teams/[teamId]/billing/upgrade.ts (3 hunks)
  • pages/api/teams/[teamId]/documents/[id]/update-notion-url.ts (4 hunks)
  • pages/api/teams/[teamId]/documents/[id]/versions/index.ts (1 hunks)
  • pages/api/teams/[teamId]/documents/agreement.ts (1 hunks)
  • pages/api/teams/[teamId]/documents/index.ts (1 hunks)
🧰 Additional context used
🧬 Code graph analysis (8)
pages/api/teams/[teamId]/documents/index.ts (1)
lib/zod/url-validation.ts (1)
  • documentUploadSchema (249-394)
pages/api/teams/[teamId]/documents/agreement.ts (1)
lib/zod/url-validation.ts (1)
  • documentUploadSchema (249-394)
pages/api/teams/[teamId]/documents/[id]/versions/index.ts (1)
lib/zod/url-validation.ts (1)
  • documentUploadSchema (249-394)
components/documents/add-document-modal.tsx (1)
lib/notion/utils.ts (1)
  • getNotionPageIdFromSlug (122-198)
lib/api/documents/process-document.ts (1)
lib/notion/utils.ts (1)
  • getNotionPageIdFromSlug (122-198)
pages/api/teams/[teamId]/documents/[id]/update-notion-url.ts (1)
lib/zod/url-validation.ts (1)
  • notionUrlUpdateSchema (187-246)
pages/api/teams/[teamId]/billing/upgrade.ts (1)
lib/dub.ts (1)
  • getDubDiscountForExternalUserId (7-24)
lib/zod/url-validation.ts (1)
lib/notion/utils.ts (2)
  • getNotionPageIdFromSlug (122-198)
  • isCustomNotionDomain (107-120)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
  • GitHub Check: Socket Security: Pull Request Alerts
🔇 Additional comments (9)
lib/dub.ts (1)

15-17: Env-aware selection of test vs. live coupon looks good.

Choosing couponTestId in non-production and couponId in production aligns with safe environment separation.

pages/api/teams/[teamId]/billing/upgrade.ts (3)

119-123: LGTM: Discount spread + Dub metadata in new-customer flow.

The shape { discounts: [{ coupon }] } is correctly spread, and metadata.dubCustomerId improves traceability.


127-135: LGTM: Consolidated waitUntil with Promise.all.

Runs identification and analytics concurrently while keeping the response streaming-friendly.


80-81: Confirm Dub externalId Mapping Matches Your App’s userId

I didn’t find any calls in our codebase creating or updating Dub customers with an externalId field (your search for dub.customers.create|update and externalId: returned no matches). That means the only place we pass externalId into Dub is in getDubDiscountForExternalUserId(userId)—but we have no guarantee that Dub customers were ever provisioned with externalId set to our userId rather than, say, email or another identifier.

• Locate where Dub customers are created/updated (in your integration code or via your Dub dashboard) and confirm the externalId value is set to the same userId that you’re passing into getDubDiscountForExternalUserId.
• If your provisioning uses a different key (email, UUID, etc.), you’ll need to update either the provisioning logic or this lookup to prevent silent “no discount” cases.

pages/api/teams/[teamId]/documents/agreement.ts (1)

35-39: Good switch to safeParseAsync; fallback for type is a nice touch

Moving to safeParseAsync correctly supports async refinements in documentUploadSchema. Also, populating type from getExtension(req.body.name) when missing helps keep clients loosely coupled to the API.

pages/api/teams/[teamId]/documents/index.ts (1)

245-247: All Vercel Blob validations use .endsWith, Zod schemas use safeParseAsync, and VERCEL_BLOB_HOST is present
Verified that:

  • No instances of schema.safeParse( remain for documentUploadSchema or notionUrlUpdateSchema.
  • All host‐matching logic in lib/zod/url-validation.ts uses hostname.endsWith(process.env.VERCEL_BLOB_HOST) (no .includes usage).
  • .env.example defines VERCEL_BLOB_HOST=vercel-storage.com.
lib/api/documents/process-document.ts (1)

58-60: Minor type-safety cleanup looks good

Assigning via pageIdFromSlug and coercing to undefined avoids propagating null. No behavior changes; improves clarity.

pages/api/teams/[teamId]/documents/[id]/update-notion-url.ts (1)

97-101: Response shape is clear

Returning both a success message and newUrl is helpful for clients. LGTM.

lib/zod/url-validation.ts (1)

1-11: Imports look good; ensure tree-shaking doesn't pull server-only code into client bundles.

getNotionPageIdFromSlug and isCustomNotionDomain do network calls; they’re fine client-side, but watch for any notion server-only exports getting bundled inadvertently. Keep these functions free of server-only deps.

I can scan for cross-imports that might drag server-only code into the client. Would you like a quick audit script?

Comment on lines +122 to +137
export async function getNotionPageIdFromSlug(
url: string,
): Promise<string | null> {
// Parse the URL to extract domain and slug
const urlObj = new URL(url);
const hostname = urlObj.hostname;

const isNotionSo = hostname === "www.notion.so" || hostname === "notion.so";
const isNotionSite = hostname.endsWith(".notion.site");

// notion.so: extract ID from path directly
if (isNotionSo) {
const pageId = parsePageId(url) ?? parsePageId(urlObj.pathname);
if (pageId) return pageId;
throw new Error(`Unable to extract page ID from Notion URL: ${url}`);
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Normalize hostnames (lowercase, strip trailing dot) before domain-family checks.

Hostnames are case-insensitive and can legally end with a trailing dot. Without normalization, HTTPS://WWW.NOTION.SO. would fail, and mixed-case hosts can bypass checks.

-export async function getNotionPageIdFromSlug(
+export async function getNotionPageIdFromSlug(
   url: string,
 ): Promise<string | null> {
   // Parse the URL to extract domain and slug
   const urlObj = new URL(url);
-  const hostname = urlObj.hostname;
+  const hostname = urlObj.hostname.toLowerCase().replace(/^\.+|\.+$/g, "");

-  const isNotionSo = hostname === "www.notion.so" || hostname === "notion.so";
-  const isNotionSite = hostname.endsWith(".notion.site");
+  const isNotionSo = hostname === "www.notion.so" || hostname === "notion.so";
+  const isNotionSite = hostname.endsWith(".notion.site");
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
export async function getNotionPageIdFromSlug(
url: string,
): Promise<string | null> {
// Parse the URL to extract domain and slug
const urlObj = new URL(url);
const hostname = urlObj.hostname;
const isNotionSo = hostname === "www.notion.so" || hostname === "notion.so";
const isNotionSite = hostname.endsWith(".notion.site");
// notion.so: extract ID from path directly
if (isNotionSo) {
const pageId = parsePageId(url) ?? parsePageId(urlObj.pathname);
if (pageId) return pageId;
throw new Error(`Unable to extract page ID from Notion URL: ${url}`);
}
export async function getNotionPageIdFromSlug(
url: string,
): Promise<string | null> {
// Parse the URL to extract domain and slug
const urlObj = new URL(url);
const hostname = urlObj.hostname.toLowerCase().replace(/^\.+|\.+$/g, "");
const isNotionSo = hostname === "www.notion.so" || hostname === "notion.so";
const isNotionSite = hostname.endsWith(".notion.site");
// notion.so: extract ID from path directly
if (isNotionSo) {
const pageId = parsePageId(url) ?? parsePageId(urlObj.pathname);
if (pageId) return pageId;
throw new Error(`Unable to extract page ID from Notion URL: ${url}`);
}
🤖 Prompt for AI Agents
In lib/notion/utils.ts around lines 122 to 137, the hostname is used directly to
decide Notion domain families which fails for mixed-case or trailing-dot
hostnames; normalize the hostname by lowercasing it and stripping any trailing
dot before doing comparisons, then use that normalized value for equality checks
("www.notion.so" / "notion.so") and for .endsWith(".notion.site") so URLs like
"HTTPS://WWW.NOTION.SO." or mixed-case hosts match correctly.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants