|
| 1 | +# CLAUDE.md |
| 2 | + |
| 3 | +This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository. |
| 4 | + |
| 5 | +## Project Overview |
| 6 | + |
| 7 | +MapSee-AI is a Python-based SNS content data extraction pipeline that processes Instagram and YouTube content to extract place/location information. It's a FastAPI service that receives URLs, downloads media content, performs speech-to-text (STT), and uses LLM (Gemini) to extract structured place data. |
| 8 | + |
| 9 | +## Development Commands |
| 10 | + |
| 11 | +```bash |
| 12 | +# Install dependencies (Python 3.13+) |
| 13 | +uv sync |
| 14 | + |
| 15 | +# Run the development server |
| 16 | +uv run uvicorn src.main:app --host 0.0.0.0 --port 8001 --reload |
| 17 | + |
| 18 | +# Alternative: run directly |
| 19 | +uv run python -m src.main |
| 20 | +``` |
| 21 | + |
| 22 | +### External Dependencies |
| 23 | +- **ffmpeg/ffprobe**: Required for audio/video processing |
| 24 | +- **yt-dlp**: Used for downloading Instagram/YouTube content |
| 25 | + |
| 26 | +## Architecture |
| 27 | + |
| 28 | +### Request Flow |
| 29 | +1. `/api/extract-places` receives `contentId` + `snsUrl` |
| 30 | +2. Request returns immediately (async processing) |
| 31 | +3. Background task runs the extraction pipeline |
| 32 | +4. Results sent to backend via callback URL |
| 33 | + |
| 34 | +### Pipeline Stages (workflow.py) |
| 35 | +``` |
| 36 | +URL β sns_router β get_audio β get_transcription (STT) β get_video_narration β get_llm_response β callback |
| 37 | + β |
| 38 | + Platform detection (YouTube/Instagram) |
| 39 | + Content type detection (video/image) |
| 40 | + Download media via yt-dlp |
| 41 | +``` |
| 42 | + |
| 43 | +### Key Components |
| 44 | + |
| 45 | +**src/apis/**: FastAPI routers |
| 46 | +- `place_router.py`: Main API endpoint for place extraction |
| 47 | + |
| 48 | +**src/services/**: Business logic |
| 49 | +- `workflow.py`: Main extraction pipeline orchestration |
| 50 | +- `content_router.py`: Routes to appropriate downloader based on platform/content type |
| 51 | +- `background_tasks.py`: Async task execution and callback handling |
| 52 | +- `smb_service.py`: SMB file server integration |
| 53 | + |
| 54 | +**src/services/modules/**: Processing modules |
| 55 | +- `llm.py`: Gemini API integration for place extraction |
| 56 | +- `stt.py`: Faster-Whisper speech-to-text |
| 57 | + |
| 58 | +**src/services/preprocess/**: Media preprocessing |
| 59 | +- `sns.py`: Instagram/YouTube content download (yt-dlp) |
| 60 | +- `audio.py`: FFmpeg audio extraction |
| 61 | +- `video.py`: Video frame extraction (OCR currently disabled) |
| 62 | + |
| 63 | +**src/models/**: Pydantic schemas |
| 64 | +- `ExtractionState`: TypedDict that flows through the pipeline, accumulating data at each stage |
| 65 | + |
| 66 | +**src/core/**: Configuration and utilities |
| 67 | +- `config.py`: Settings from .env (API keys, SMB config, etc.) |
| 68 | +- `exceptions.py`: CustomError class for pipeline errors |
| 69 | + |
| 70 | +### State Flow Pattern |
| 71 | +The pipeline uses `ExtractionState` (TypedDict) as a mutable state object that gets passed through each processing stage. Each stage updates specific fields: |
| 72 | +- `contentStream`/`imageStream`: Downloaded media |
| 73 | +- `captionText`: Post caption/description |
| 74 | +- `audioStream`: Extracted audio |
| 75 | +- `transcriptionText`: STT output |
| 76 | +- `ocrText`: Video text (currently disabled) |
| 77 | +- `result`: Final extracted places |
| 78 | + |
| 79 | +## Configuration |
| 80 | + |
| 81 | +Required environment variables in `.env`: |
| 82 | +- `GOOGLE_API_KEY`: Gemini API key |
| 83 | +- `AI_SERVER_API_KEY`: API key for this service |
| 84 | +- `YOUTUBE_API_KEY`: YouTube Data API key |
| 85 | +- `INSTAGRAM_POST_DOC_ID`, `INSTAGRAM_APP_ID`: Instagram API config |
| 86 | +- `BACKEND_CALLBACK_URL`, `BACKEND_API_KEY`: Callback endpoint config |
| 87 | +- `SMB_*`: SMB file server settings (optional) |
| 88 | + |
| 89 | +## Notes |
| 90 | + |
| 91 | +- OCR functionality is currently disabled (noted with comments throughout) |
| 92 | +- The service uses in-memory BytesIO streams for media processing |
| 93 | +- Faster-Whisper runs on CPU with int8 quantization by default |
| 94 | +- LLM responses are validated against Pydantic schemas using `response_json_schema` |
0 commit comments