Skip to content

Security: emartai/syntho

Security

security.md

Syntho β€” Security Guidelines

Security Context File (Paste alongside context.md at the top of every prompt session)


πŸ” Authentication & Authorization

Supabase JWT Verification

  • Every FastAPI route (except /health and /api/webhooks/*) MUST verify the Supabase JWT
  • Use the get_current_user dependency injected into every protected route
  • Never trust user_id from the request body β€” always extract it from the verified JWT
  • JWT secret lives ONLY in backend .env β€” never in frontend code
# backend/app/middleware/auth.py
# ALWAYS use this pattern β€” never skip auth on protected routes
async def get_current_user(authorization: str = Header(...)):
    try:
        token = authorization.replace("Bearer ", "")
        payload = jwt.decode(token, SUPABASE_JWT_SECRET, algorithms=["HS256"],
                             audience="authenticated")
        return payload["sub"]  # user_id
    except JWTError:
        raise HTTPException(status_code=401, detail="Invalid token")

Row Level Security (RLS)

  • RLS MUST be enabled on ALL Supabase tables β€” no exceptions
  • Backend uses service role key (bypasses RLS) β€” so backend code must manually enforce ownership checks
  • Frontend NEVER uses service role key β€” only anon key (RLS is the guard)
  • Always verify resource ownership before returning data:
# CORRECT β€” always filter by user_id
dataset = supabase.table("datasets").select("*").eq("id", dataset_id).eq("user_id", user_id).single()

# WRONG β€” never fetch by id alone
dataset = supabase.table("datasets").select("*").eq("id", dataset_id).single()

API Key Authentication

  • API keys are stored as SHA-256 hashes β€” never store raw keys in the database
  • The raw key is shown to the user ONCE on creation β€” never retrievable again
  • Always use secrets.compare_digest() for hash comparison (timing-safe)
  • API keys must be checked for: is_active=True, not expired, correct scope for the operation
import hashlib, secrets

def hash_api_key(raw_key: str) -> str:
    return hashlib.sha256(raw_key.encode()).hexdigest()

def verify_api_key(raw_key: str, stored_hash: str) -> bool:
    return secrets.compare_digest(hash_api_key(raw_key), stored_hash)

Admin Routes

  • Admin routes MUST check profile.role == 'admin' after JWT verification
  • Admin role is set in the database β€” never trust a role claim from the frontend
  • Log all admin actions to an audit table with: admin_id, action, target_id, timestamp

πŸ›‘οΈ Input Validation & Sanitization

File Upload Security

  • Validate file type by MIME type AND magic bytes β€” never trust the file extension alone
  • Reject files that do not match expected types (CSV, JSON, Parquet, XLSX)
  • Max file size enforced at both frontend (client-side) and backend (server-side): 100MB
  • Scan file names: strip path separators, reject null bytes, normalize unicode
  • Store files with a UUID-based path β€” never use the original filename as the storage path
import magic  # python-magic library

ALLOWED_MIME_TYPES = {
    "text/csv", "application/json",
    "application/vnd.apache.parquet",
    "application/vnd.openxmlformats-officedocument.spreadsheetml.sheet"
}

def validate_file(file_bytes: bytes, claimed_type: str):
    detected = magic.from_buffer(file_bytes[:2048], mime=True)
    if detected not in ALLOWED_MIME_TYPES:
        raise HTTPException(400, "Invalid file type")

API Request Validation

  • All request bodies validated with Pydantic models β€” no raw dict access
  • String fields: strip whitespace, enforce max lengths
  • Numeric fields: enforce min/max ranges (e.g., epochs: 100–500, num_rows: 1–1,000,000)
  • UUIDs: validate format before any DB query
  • Reject unexpected fields: use model_config = ConfigDict(extra='forbid') in Pydantic models

SQL Injection Prevention

  • NEVER use raw SQL string formatting β€” always use Supabase's parameterized query builder
  • If raw SQL is needed (migrations only), use parameterized queries exclusively
# CORRECT
supabase.table("datasets").select("*").eq("user_id", user_id)

# WRONG β€” never do this
supabase.rpc(f"SELECT * FROM datasets WHERE user_id = '{user_id}'")

πŸ”‘ Secrets Management

Environment Variable Rules

  • No secrets in source code β€” ever. Not even in comments or test files
  • No secrets in git history β€” add .env, .env.local, .env.* to .gitignore immediately
  • No secrets in frontend bundle β€” NEXT_PUBLIC_* vars are visible to everyone
  • Rotate any key that is accidentally committed to git immediately

What Goes Where

Secret Frontend Backend Modal
Supabase URL βœ… NEXT_PUBLIC βœ… βœ…
Supabase Anon Key βœ… NEXT_PUBLIC ❌ ❌
Supabase Service Key ❌ NEVER βœ… βœ…
Supabase JWT Secret ❌ NEVER βœ… ❌
Flutterwave Public Key βœ… NEXT_PUBLIC ❌ ❌
Flutterwave Secret Key ❌ NEVER βœ… ❌
Modal API Secret ❌ NEVER βœ… βœ…
Redis URL ❌ NEVER βœ… ❌

Secret Rotation Plan

  • Rotate Supabase JWT secret: update in backend .env + redeploy
  • Rotate Flutterwave keys: update in Render env vars + redeploy
  • Rotate Modal API secret: update in both backend .env AND Modal secret store
  • API keys (user-facing): users can revoke and regenerate at any time

🚦 Rate Limiting

Endpoints and Their Limits

Endpoint Limit Window
POST /api/v1/datasets (upload) 20 req per hour per user
POST /api/v1/generate 10 req per hour per user
POST /api/v1/purchases/verify 30 req per hour per user
GET /api/v1/marketplace 120 req per minute per IP
All external API (sk_live_* keys) 60 req per minute per key
All external API (sk_live_* keys) 1000 req per day per key
POST /auth/* (login attempts) 10 req per 15 min per IP

Implementation

  • Use Redis sliding window counter for all rate limits
  • Return standard headers: X-RateLimit-Limit, X-RateLimit-Remaining, X-RateLimit-Reset
  • Return 429 with JSON body: {"error": "Rate limit exceeded", "retry_after": seconds}
  • Log repeated rate limit violations β€” could indicate abuse or attack

🌐 CORS & Headers

CORS Configuration (FastAPI)

# backend/app/main.py
from fastapi.middleware.cors import CORSMiddleware

ALLOWED_ORIGINS = [
    "https://your-syntho-app.vercel.app",  # production only
    "http://localhost:3000",               # local dev only
]

app.add_middleware(
    CORSMiddleware,
    allow_origins=ALLOWED_ORIGINS,         # NEVER use ["*"] in production
    allow_credentials=True,
    allow_methods=["GET", "POST", "PATCH", "DELETE"],
    allow_headers=["Authorization", "Content-Type", "X-API-Secret"],
)

Security Headers (Next.js β€” vercel.json)

Always set these headers on all frontend responses:

{
  "headers": [
    {
      "source": "/(.*)",
      "headers": [
        { "key": "X-Frame-Options", "value": "DENY" },
        { "key": "X-Content-Type-Options", "value": "nosniff" },
        { "key": "Referrer-Policy", "value": "strict-origin-when-cross-origin" },
        { "key": "Permissions-Policy", "value": "camera=(), microphone=(), geolocation=()" },
        { "key": "Strict-Transport-Security", "value": "max-age=63072000; includeSubDomains; preload" }
      ]
    }
  ]
}

πŸ’³ Payment Security

Flutterwave Webhook Verification

  • ALWAYS verify the webhook hash before processing any payment event
  • Never grant access based solely on a frontend payment success callback
  • The backend must independently verify every transaction with Flutterwave's API
import hmac, hashlib

def verify_flutterwave_webhook(payload_str: str, signature: str, secret_hash: str) -> bool:
    expected = hmac.new(
        secret_hash.encode(), payload_str.encode(), hashlib.sha256
    ).hexdigest()
    return hmac.compare_digest(expected, signature)

# In webhook route:
signature = request.headers.get("verif-hash")
if not verify_flutterwave_webhook(body_str, signature, FLUTTERWAVE_WEBHOOK_HASH):
    raise HTTPException(400, "Invalid webhook signature")

Purchase Verification Flow

  1. Frontend sends tx_ref to backend after Flutterwave inline checkout success
  2. Backend calls Flutterwave /v3/transactions/verify_by_reference β€” never trust frontend
  3. Verify: amount matches listing price, currency matches, status is "successful"
  4. Only then create purchase record and grant download access
  5. Webhook provides a second confirmation β€” use it for logging/reconciliation
  6. Make purchase creation idempotent: use ON CONFLICT (flutterwave_tx_ref) DO NOTHING

Download Access Control

  • Signed Supabase Storage URLs expire in 1 hour β€” never return permanent URLs
  • Before generating a signed URL, always verify the user has a completed purchase record
  • Re-generate URL on each request β€” do not cache signed URLs

πŸ“ File Storage Security

Supabase Storage Policies

-- Users can only upload to their own folder
CREATE POLICY "users_upload_own_folder" ON storage.objects
  FOR INSERT TO authenticated
  WITH CHECK (bucket_id = 'datasets' AND (storage.foldername(name))[1] = auth.uid()::text);

-- Users can only read their own files
CREATE POLICY "users_read_own_files" ON storage.objects
  FOR SELECT TO authenticated
  USING (bucket_id IN ('datasets', 'synthetic', 'reports')
         AND (storage.foldername(name))[1] = auth.uid()::text);

-- Marketplace buyers can read purchased synthetic files
-- (handled via backend signed URL generation after purchase check β€” not direct storage access)

File Path Structure

Always use this path format to enforce ownership via storage policies:

datasets/{user_id}/{dataset_id}/{uuid}.csv        βœ…
synthetic/{user_id}/{synthetic_id}/data.csv       βœ…
reports/{user_id}/{synthetic_id}/compliance.pdf   βœ…

datasets/myfile.csv                               ❌ Never flat paths

πŸ€– Modal ML Security

Endpoint Authentication

  • Modal web endpoint MUST validate X-API-Secret header on every request
  • Use secrets.compare_digest() for timing-safe comparison
  • Reject all requests without valid secret with 401
# modal_ml/main.py
import os, secrets

@app.function(...)
@modal.web_endpoint(method="POST")
async def run_job(request: Request):
    api_secret = request.headers.get("X-API-Secret", "")
    if not secrets.compare_digest(api_secret, os.environ["MODAL_API_SECRET"]):
        return {"error": "Unauthorized"}, 401

Input Validation in Modal

  • Validate all payload fields before processing (synthetic_dataset_id is valid UUID, method is in allowed list, config values are within safe ranges)
  • Max dataset size for ML processing: 500MB β€” reject larger files
  • CTGAN epochs: cap at 500 β€” prevent runaway GPU jobs
  • Always wrap ML job in try/except β€” update DB to 'failed' status on any exception, never leave jobs hanging in 'running'

πŸ” Data Privacy & Compliance

Handling Real User Data

  • Original uploaded datasets may contain real PII β€” treat them as sensitive
  • Never log dataset contents β€” only log metadata (file size, row count, column names)
  • Original files are deleted from Supabase Storage when the user deletes the dataset
  • Synthetic files are what get shared/sold β€” never the originals

Privacy Score Enforcement

  • Datasets with privacy_score < 40 (critical risk) MUST NOT be listable on marketplace
  • Enforce this check server-side in the listing creation endpoint β€” never rely on frontend
  • Compliance report must show PASSED before a dataset can be listed
# backend/app/routers/marketplace.py
# ALWAYS enforce minimum privacy score before allowing listing
privacy = supabase.table("privacy_scores").select("overall_score, risk_level") \
    .eq("synthetic_dataset_id", synthetic_dataset_id).single().data

if privacy["overall_score"] < 40 or privacy["risk_level"] == "critical":
    raise HTTPException(400, "Dataset privacy score too low to list on marketplace")

GDPR Compliance for Syntho Itself

  • Users can request deletion of their account + all data (implement DELETE /api/v1/account endpoint)
  • Deleting a user cascades to: datasets, synthetic_datasets, api_keys, purchases (buyer side), marketplace_listings
  • Seller listings that have been purchased must be anonymized (seller_id β†’ NULL), not deleted
  • Store only the minimum data needed β€” do not collect unnecessary user info

🚨 Error Handling & Logging

What to Log (Safe)

  • Request method, path, status code, response time
  • User ID (not email), dataset ID, job ID
  • Error types and stack traces (server-side only)
  • Rate limit violations with IP
  • Failed authentication attempts with IP

What to Never Log

  • JWT tokens or API keys (even partial β€” except the display prefix)
  • File contents or dataset data
  • User emails or personal info in application logs
  • Flutterwave secret keys or webhook payloads with card data
  • Supabase service role key

Error Response Format

Never expose internal error details to the client:

# CORRECT β€” safe error response
raise HTTPException(status_code=500, detail="Internal server error")

# WRONG β€” leaks implementation details
raise HTTPException(status_code=500, detail=str(e))  # exposes stack trace

πŸ”’ Dependency Security

Keep Dependencies Updated

  • Run pip audit monthly on backend dependencies
  • Run npm audit monthly on frontend dependencies
  • Pin dependency versions in requirements.txt and package.json (already done in context.md)
  • Never install packages with --ignore-security flags

Trusted Packages Only

All ML libraries used (SDV, CTGAN, Presidio, scikit-learn) are from verified, maintained sources. Before adding any new dependency:

  • Check it has recent commits and active maintenance
  • Check PyPI/npm download counts
  • Review open security advisories

βœ… Security Checklist (Complete Before Launch)

Backend

  • All routes require authentication (except /health, /webhooks)
  • RLS enabled on all Supabase tables
  • File upload validates MIME type + magic bytes
  • Rate limiting active on all endpoints
  • CORS locked to production Vercel URL only
  • Flutterwave webhook signature verified on every event
  • Modal endpoint validates X-API-Secret
  • No secrets in source code or git history
  • Error responses never expose stack traces

Frontend

  • No service role key or secret keys in any frontend code
  • Security headers set in vercel.json
  • Signed URLs used for all file downloads (never permanent URLs)
  • Auth token sent in Authorization header (not URL params or cookies without Secure flag)

Database

  • RLS policies tested: user A cannot access user B's data
  • Admin role verified server-side (not from frontend claim)
  • Cascade deletes work correctly (test account deletion)
  • Storage bucket policies block cross-user access

Payments

  • Purchase verification calls Flutterwave API β€” not just frontend callback
  • Webhook signature verified before processing
  • Download access checked server-side before every signed URL generation
  • Privacy score >= 40 enforced before marketplace listing

Marketplace

  • Listings only show schema preview β€” never raw data to non-buyers
  • Critical privacy score datasets blocked from listing
  • Compliance report PASSED required before listing

There aren’t any published security advisories