- Every FastAPI route (except
/healthand/api/webhooks/*) MUST verify the Supabase JWT - Use the
get_current_userdependency injected into every protected route - Never trust user_id from the request body β always extract it from the verified JWT
- JWT secret lives ONLY in backend
.envβ never in frontend code
# backend/app/middleware/auth.py
# ALWAYS use this pattern β never skip auth on protected routes
async def get_current_user(authorization: str = Header(...)):
try:
token = authorization.replace("Bearer ", "")
payload = jwt.decode(token, SUPABASE_JWT_SECRET, algorithms=["HS256"],
audience="authenticated")
return payload["sub"] # user_id
except JWTError:
raise HTTPException(status_code=401, detail="Invalid token")- RLS MUST be enabled on ALL Supabase tables β no exceptions
- Backend uses service role key (bypasses RLS) β so backend code must manually enforce ownership checks
- Frontend NEVER uses service role key β only anon key (RLS is the guard)
- Always verify resource ownership before returning data:
# CORRECT β always filter by user_id
dataset = supabase.table("datasets").select("*").eq("id", dataset_id).eq("user_id", user_id).single()
# WRONG β never fetch by id alone
dataset = supabase.table("datasets").select("*").eq("id", dataset_id).single()- API keys are stored as SHA-256 hashes β never store raw keys in the database
- The raw key is shown to the user ONCE on creation β never retrievable again
- Always use
secrets.compare_digest()for hash comparison (timing-safe) - API keys must be checked for: is_active=True, not expired, correct scope for the operation
import hashlib, secrets
def hash_api_key(raw_key: str) -> str:
return hashlib.sha256(raw_key.encode()).hexdigest()
def verify_api_key(raw_key: str, stored_hash: str) -> bool:
return secrets.compare_digest(hash_api_key(raw_key), stored_hash)- Admin routes MUST check
profile.role == 'admin'after JWT verification - Admin role is set in the database β never trust a role claim from the frontend
- Log all admin actions to an audit table with: admin_id, action, target_id, timestamp
- Validate file type by MIME type AND magic bytes β never trust the file extension alone
- Reject files that do not match expected types (CSV, JSON, Parquet, XLSX)
- Max file size enforced at both frontend (client-side) and backend (server-side): 100MB
- Scan file names: strip path separators, reject null bytes, normalize unicode
- Store files with a UUID-based path β never use the original filename as the storage path
import magic # python-magic library
ALLOWED_MIME_TYPES = {
"text/csv", "application/json",
"application/vnd.apache.parquet",
"application/vnd.openxmlformats-officedocument.spreadsheetml.sheet"
}
def validate_file(file_bytes: bytes, claimed_type: str):
detected = magic.from_buffer(file_bytes[:2048], mime=True)
if detected not in ALLOWED_MIME_TYPES:
raise HTTPException(400, "Invalid file type")- All request bodies validated with Pydantic models β no raw dict access
- String fields: strip whitespace, enforce max lengths
- Numeric fields: enforce min/max ranges (e.g., epochs: 100β500, num_rows: 1β1,000,000)
- UUIDs: validate format before any DB query
- Reject unexpected fields: use
model_config = ConfigDict(extra='forbid')in Pydantic models
- NEVER use raw SQL string formatting β always use Supabase's parameterized query builder
- If raw SQL is needed (migrations only), use parameterized queries exclusively
# CORRECT
supabase.table("datasets").select("*").eq("user_id", user_id)
# WRONG β never do this
supabase.rpc(f"SELECT * FROM datasets WHERE user_id = '{user_id}'")- No secrets in source code β ever. Not even in comments or test files
- No secrets in git history β add
.env,.env.local,.env.*to.gitignoreimmediately - No secrets in frontend bundle β
NEXT_PUBLIC_*vars are visible to everyone - Rotate any key that is accidentally committed to git immediately
| Secret | Frontend | Backend | Modal |
|---|---|---|---|
| Supabase URL | β NEXT_PUBLIC | β | β |
| Supabase Anon Key | β NEXT_PUBLIC | β | β |
| Supabase Service Key | β NEVER | β | β |
| Supabase JWT Secret | β NEVER | β | β |
| Flutterwave Public Key | β NEXT_PUBLIC | β | β |
| Flutterwave Secret Key | β NEVER | β | β |
| Modal API Secret | β NEVER | β | β |
| Redis URL | β NEVER | β | β |
- Rotate Supabase JWT secret: update in backend
.env+ redeploy - Rotate Flutterwave keys: update in Render env vars + redeploy
- Rotate Modal API secret: update in both backend
.envAND Modal secret store - API keys (user-facing): users can revoke and regenerate at any time
| Endpoint | Limit | Window |
|---|---|---|
| POST /api/v1/datasets (upload) | 20 req | per hour per user |
| POST /api/v1/generate | 10 req | per hour per user |
| POST /api/v1/purchases/verify | 30 req | per hour per user |
| GET /api/v1/marketplace | 120 req | per minute per IP |
| All external API (sk_live_* keys) | 60 req | per minute per key |
| All external API (sk_live_* keys) | 1000 req | per day per key |
| POST /auth/* (login attempts) | 10 req | per 15 min per IP |
- Use Redis sliding window counter for all rate limits
- Return standard headers:
X-RateLimit-Limit,X-RateLimit-Remaining,X-RateLimit-Reset - Return 429 with JSON body:
{"error": "Rate limit exceeded", "retry_after": seconds} - Log repeated rate limit violations β could indicate abuse or attack
# backend/app/main.py
from fastapi.middleware.cors import CORSMiddleware
ALLOWED_ORIGINS = [
"https://your-syntho-app.vercel.app", # production only
"http://localhost:3000", # local dev only
]
app.add_middleware(
CORSMiddleware,
allow_origins=ALLOWED_ORIGINS, # NEVER use ["*"] in production
allow_credentials=True,
allow_methods=["GET", "POST", "PATCH", "DELETE"],
allow_headers=["Authorization", "Content-Type", "X-API-Secret"],
)Always set these headers on all frontend responses:
{
"headers": [
{
"source": "/(.*)",
"headers": [
{ "key": "X-Frame-Options", "value": "DENY" },
{ "key": "X-Content-Type-Options", "value": "nosniff" },
{ "key": "Referrer-Policy", "value": "strict-origin-when-cross-origin" },
{ "key": "Permissions-Policy", "value": "camera=(), microphone=(), geolocation=()" },
{ "key": "Strict-Transport-Security", "value": "max-age=63072000; includeSubDomains; preload" }
]
}
]
}- ALWAYS verify the webhook hash before processing any payment event
- Never grant access based solely on a frontend payment success callback
- The backend must independently verify every transaction with Flutterwave's API
import hmac, hashlib
def verify_flutterwave_webhook(payload_str: str, signature: str, secret_hash: str) -> bool:
expected = hmac.new(
secret_hash.encode(), payload_str.encode(), hashlib.sha256
).hexdigest()
return hmac.compare_digest(expected, signature)
# In webhook route:
signature = request.headers.get("verif-hash")
if not verify_flutterwave_webhook(body_str, signature, FLUTTERWAVE_WEBHOOK_HASH):
raise HTTPException(400, "Invalid webhook signature")- Frontend sends tx_ref to backend after Flutterwave inline checkout success
- Backend calls Flutterwave
/v3/transactions/verify_by_referenceβ never trust frontend - Verify: amount matches listing price, currency matches, status is "successful"
- Only then create purchase record and grant download access
- Webhook provides a second confirmation β use it for logging/reconciliation
- Make purchase creation idempotent: use
ON CONFLICT (flutterwave_tx_ref) DO NOTHING
- Signed Supabase Storage URLs expire in 1 hour β never return permanent URLs
- Before generating a signed URL, always verify the user has a completed purchase record
- Re-generate URL on each request β do not cache signed URLs
-- Users can only upload to their own folder
CREATE POLICY "users_upload_own_folder" ON storage.objects
FOR INSERT TO authenticated
WITH CHECK (bucket_id = 'datasets' AND (storage.foldername(name))[1] = auth.uid()::text);
-- Users can only read their own files
CREATE POLICY "users_read_own_files" ON storage.objects
FOR SELECT TO authenticated
USING (bucket_id IN ('datasets', 'synthetic', 'reports')
AND (storage.foldername(name))[1] = auth.uid()::text);
-- Marketplace buyers can read purchased synthetic files
-- (handled via backend signed URL generation after purchase check β not direct storage access)Always use this path format to enforce ownership via storage policies:
datasets/{user_id}/{dataset_id}/{uuid}.csv β
synthetic/{user_id}/{synthetic_id}/data.csv β
reports/{user_id}/{synthetic_id}/compliance.pdf β
datasets/myfile.csv β Never flat paths
- Modal web endpoint MUST validate
X-API-Secretheader on every request - Use
secrets.compare_digest()for timing-safe comparison - Reject all requests without valid secret with 401
# modal_ml/main.py
import os, secrets
@app.function(...)
@modal.web_endpoint(method="POST")
async def run_job(request: Request):
api_secret = request.headers.get("X-API-Secret", "")
if not secrets.compare_digest(api_secret, os.environ["MODAL_API_SECRET"]):
return {"error": "Unauthorized"}, 401- Validate all payload fields before processing (synthetic_dataset_id is valid UUID, method is in allowed list, config values are within safe ranges)
- Max dataset size for ML processing: 500MB β reject larger files
- CTGAN epochs: cap at 500 β prevent runaway GPU jobs
- Always wrap ML job in try/except β update DB to 'failed' status on any exception, never leave jobs hanging in 'running'
- Original uploaded datasets may contain real PII β treat them as sensitive
- Never log dataset contents β only log metadata (file size, row count, column names)
- Original files are deleted from Supabase Storage when the user deletes the dataset
- Synthetic files are what get shared/sold β never the originals
- Datasets with privacy_score < 40 (critical risk) MUST NOT be listable on marketplace
- Enforce this check server-side in the listing creation endpoint β never rely on frontend
- Compliance report must show PASSED before a dataset can be listed
# backend/app/routers/marketplace.py
# ALWAYS enforce minimum privacy score before allowing listing
privacy = supabase.table("privacy_scores").select("overall_score, risk_level") \
.eq("synthetic_dataset_id", synthetic_dataset_id).single().data
if privacy["overall_score"] < 40 or privacy["risk_level"] == "critical":
raise HTTPException(400, "Dataset privacy score too low to list on marketplace")- Users can request deletion of their account + all data (implement DELETE /api/v1/account endpoint)
- Deleting a user cascades to: datasets, synthetic_datasets, api_keys, purchases (buyer side), marketplace_listings
- Seller listings that have been purchased must be anonymized (seller_id β NULL), not deleted
- Store only the minimum data needed β do not collect unnecessary user info
- Request method, path, status code, response time
- User ID (not email), dataset ID, job ID
- Error types and stack traces (server-side only)
- Rate limit violations with IP
- Failed authentication attempts with IP
- JWT tokens or API keys (even partial β except the display prefix)
- File contents or dataset data
- User emails or personal info in application logs
- Flutterwave secret keys or webhook payloads with card data
- Supabase service role key
Never expose internal error details to the client:
# CORRECT β safe error response
raise HTTPException(status_code=500, detail="Internal server error")
# WRONG β leaks implementation details
raise HTTPException(status_code=500, detail=str(e)) # exposes stack trace- Run
pip auditmonthly on backend dependencies - Run
npm auditmonthly on frontend dependencies - Pin dependency versions in requirements.txt and package.json (already done in context.md)
- Never install packages with
--ignore-securityflags
All ML libraries used (SDV, CTGAN, Presidio, scikit-learn) are from verified, maintained sources. Before adding any new dependency:
- Check it has recent commits and active maintenance
- Check PyPI/npm download counts
- Review open security advisories
- All routes require authentication (except /health, /webhooks)
- RLS enabled on all Supabase tables
- File upload validates MIME type + magic bytes
- Rate limiting active on all endpoints
- CORS locked to production Vercel URL only
- Flutterwave webhook signature verified on every event
- Modal endpoint validates X-API-Secret
- No secrets in source code or git history
- Error responses never expose stack traces
- No service role key or secret keys in any frontend code
- Security headers set in vercel.json
- Signed URLs used for all file downloads (never permanent URLs)
- Auth token sent in Authorization header (not URL params or cookies without Secure flag)
- RLS policies tested: user A cannot access user B's data
- Admin role verified server-side (not from frontend claim)
- Cascade deletes work correctly (test account deletion)
- Storage bucket policies block cross-user access
- Purchase verification calls Flutterwave API β not just frontend callback
- Webhook signature verified before processing
- Download access checked server-side before every signed URL generation
- Privacy score >= 40 enforced before marketplace listing
- Listings only show schema preview β never raw data to non-buyers
- Critical privacy score datasets blocked from listing
- Compliance report PASSED required before listing