NYU x Corner Datathon 2025 - 1st Place Winner
A full-stack web application that provides venue recommendations for NYC using RAG with Hybrid Search & Reranking.
Visit Vibio to use it yourself!
This project consists of:
- Frontend: Next.js 15 with TypeScript and Tailwind CSS
- Backend: FastAPI with Python
- Embedding and Reranking:
- CLIP for image-text embeddings
- Sentence Transformers for dense embeddings
- SPLADE for sparse embeddings
- BGE Reranker for final ranking against user query
- LLM Output: Claude 3.5 Sonnet for natural language
- Vector DB: Dense and Sparse Pinecone indices
- Deployment: Railway (backend) + Vercel (frontend)
# 1. Generate embeddings
clip_embedding = clip_model.encode_text(query) # CLIP (image/text)
sparse_embedding = sparse_model.embed(query) # SPLADE (sparse)
dense_embedding = metadata_model.encode(query) # SentenceTransformer (dense)
# 2. Query Pinecone indexes (async)
image_q = pinecone_image_query(clip_embedding)
sparse_q = pinecone_sparse_query(sparse_embedding)
dense_q = pinecone_dense_query(dense_embedding)
image_results, sparse_results, dense_results = await asyncio.gather(
image_q, sparse_q, dense_q
)
# 3. Merge hybrid search results
hybrid_results = merge_matches(dense_results, sparse_results, image_results)
# 4. Rerank with BGE
reranked = pc.inference.rerank(
model="bge-reranker-v2-m3",
query=query,
documents=list(hybrid_results.keys()),
top_n=8
)
- Python 3.8+
- Node.js 18+
- Pinecone API key
- Anthropic API key
- Navigate to backend:
cd backend
- Install dependencies:
pip install -r requirements.txt
- Set environment variables:
PINECONE_API_KEY=your_pinecone_api_key
PERSONAL_ANTHROPIC=your_anthropic_api_key
- Run the server:
uvicorn main:app --reload
- Navigate to frontend:
cd frontend
- Install dependencies:
npm install
- Set environment variables:
NEXT_PUBLIC_API_URL=http://localhost:8000
- Run the development server:
npm run dev
POST /search
Content-Type: application/json
{
"query": "where to drink a matcha"
}
Response:
{
"llm_response": "I think you'd like the following:\n\n1. Matcha Bar NYC in East Village: This cozy matcha cafe offers traditional Japanese vibes with excellent matcha drinks and a serene atmosphere perfect for relaxing or studying.\n\n2. Cha Cha Matcha in SoHo: Known for their Instagram-worthy matcha creations and trendy atmosphere.\n\n3. Matchaful in West Village: Offers high-quality ceremonial grade matcha with a minimalist aesthetic.\n\nThese spots all offer authentic matcha experiences with different vibes - from traditional to trendy. Perfect for matcha enthusiasts!",
"places": [
{
"place_id": "venue_123",
"name": "Matcha Bar NYC",
"neighborhood": "East Village",
"description": "Cozy matcha cafe with traditional Japanese vibes",
"reviews": "['Amazing matcha quality', 'Great atmosphere for studying', 'Authentic Japanese experience']",
"emoji": "🍵",
"score": 0.95
}
],
"total_results": 15,
"query": "where to drink a matcha"
}
GET /health
Response:
{
"status": "healthy",
"message": "API is running"
}
The backend is automatically deployed to Railway with the configuration in railway.json
.
The frontend can be deployed to Vercel with the configuration in vercel.json
.
The original datathon solution was implemented in Jupyter Notebook (datathon.ipynb
) and demonstrated:
- RAG recommendation system for NYC venues
- Hybrid dense and sparse search
- Image-text understanding with CLIP
- FAISS index for vector storage and retrieval