Skip to content

feat: resolve issues #157 #159 #161 #162 — search locations, item recommender, incremental ETL, row validation#170

Merged
BigBen-7 merged 2 commits intoLead-Studios:mainfrom
portableDD:feat/issues-157-159-161-162
Mar 26, 2026
Merged

feat: resolve issues #157 #159 #161 #162 — search locations, item recommender, incremental ETL, row validation#170
BigBen-7 merged 2 commits intoLead-Studios:mainfrom
portableDD:feat/issues-157-159-161-162

Conversation

@portableDD
Copy link
Copy Markdown
Contributor

Summary

This PR resolves four related enhancement issues across the search, recommendation, and ETL subsystems.

Issue #157 — Expand location detection beyond hardcoded cities

  • Added KNOWN_LOCATIONS env var (comma-separated) to Settings with 20 default Nigerian cities (original 10 + Owerri, Warri, Uyo, Akure, Ilorin, Sokoto, Zaria, Maiduguri, Asaba, Nnewi)
  • extract_keywords() now loads the city list from KNOWN_LOCATIONS instead of a hardcoded list
  • Unknown cities (after preposition keywords like "in", "at", "near") are captured as fuzzy_locations and substring-matched against event.location / event.city in filter_events_by_keywords()
  • Added city field to every mock event and added two new extended-city events (Owerri, Warri)
  • Tests: tests/test_search_location.py — known-city exact match, unknown-city fuzzy match, no-location query, KNOWN_LOCATIONS env override

Issue #159 — Replace user-based collaborative filter with item-based similarity

  • New src/recommender.py with build_item_similarity_matrix() (pure-Python cosine similarity, no numpy) and get_item_recommendations() with cold-start fallback
  • Cold-start (user has no history or is unknown): returns the 3 most popular events by purchase count
  • get_user_events_from_db() stub queries user_event_purchases via SQLAlchemy, falls back to empty dict when table is absent
  • /recommend-events endpoint uses the new item-based algorithm; mock_user_events remains as a fallback when the DB is unavailable
  • Updated tests/test_recommend.py: unknown user now returns 200 with cold-start recommendations instead of 404
  • Tests: tests/test_recommender.py — similarity matrix correctness, symmetry, top-3 output, cold-start paths, edge cases

Issue #161 — Incremental ETL extract using a cursor

  • extract_events_and_sales(since=None) accepts an optional ISO-8601 cursor; when provided, ?since=<ISO> is forwarded to both /events and /ticket-sales
  • run_etl_once() creates the etl_run_log table (if it does not exist), reads the last successful finished_at as the cursor, and writes a new log row after every run (success or failure)
  • Cursor is only advanced on a successful run; a failed run leaves the previous cursor intact so the next retry re-fetches the same window
  • Tests: tests/test_etl_incremental.py — no-cursor first run, cursor-forwarded subsequent run, failed-run cursor not advanced

Issue #162 — ETL data validation step between transform and load

  • validate_rows(event_summary_rows, daily_rows) returns (valid_ev, valid_daily, rejected_count):
    • Rejects event rows with empty/None event_id, negative total_tickets, or negative total_revenue
    • Rejects daily rows with empty event_id or sale_date more than 1 day in the future
    • Every rejected row is logged as a WARNING for full traceability — nothing is silently dropped
  • run_etl_once() calls validate_rows before load_postgres; rejected_count is stored in etl_run_log
  • Tests: tests/test_etl_validation.py — all rejection rules, boundary dates, mixed batches, empty inputs

Closes #157
Closes #159
Closes #161
Closes #162

Lead-Studios#162

Issue Lead-Studios#157 — expand location detection beyond hardcoded Nigerian cities
- Add KNOWN_LOCATIONS env var (comma-separated) to Settings with 20 default cities
- Replace hardcoded list in extract_keywords() with the configurable setting
- Detect unknown cities via preposition patterns and return them as fuzzy_locations
- filter_events_by_keywords() uses fuzzy substring match when location is unknown
- Add city field to every mock event and add two new extended-city events (Owerri, Warri)
- Add tests/test_search_location.py: known-city exact match, unknown-city fuzzy match, no-location query

Issue Lead-Studios#159 — replace user-based collaborative filter with item-based similarity
- Create src/recommender.py with build_item_similarity_matrix() (pure-Python cosine similarity)
- Implement get_item_recommendations() with cold-start fallback (most popular events)
- Add get_user_events_from_db() stub backed by SQLAlchemy, falls back to empty dict
- Update /recommend-events endpoint to use the new item-based recommender
- Update test_recommend.py: unknown user now returns 200 cold-start recs instead of 404
- Add tests/test_recommender.py: similarity matrix, correct top-3, cold-start paths

Issue Lead-Studios#161 — incremental ETL extract using a cursor
- extract_events_and_sales() accepts optional since: str param
- ?since=<ISO> is forwarded to both /events and /ticket-sales when provided
- run_etl_once() creates etl_run_log table, reads last successful finished_at cursor,
  passes it to extract, and writes a new log row (status + rejected_count) on completion
- Cursor is only advanced on success; failed runs leave the previous cursor intact
- Add tests/test_etl_incremental.py: no-cursor first run, cursor-forwarded run, failed run

Issue Lead-Studios#162 — ETL data validation step between transform and load
- Add validate_rows(event_summary_rows, daily_rows) → (valid_ev, valid_daily, rejected_count)
  - Rejects event rows with empty/None event_id, negative total_tickets, negative total_revenue
  - Rejects daily rows with empty event_id or sale_date more than 1 day in the future
  - Every rejection is logged as a warning for traceability
- run_etl_once() calls validate_rows before load_postgres; rejected_count stored in run log
- Add tests/test_etl_validation.py covering all rejection rules and mixed batches

Closes Lead-Studios#157
Closes Lead-Studios#159
Closes Lead-Studios#161
Closes Lead-Studios#162
@drips-wave
Copy link
Copy Markdown

drips-wave bot commented Mar 26, 2026

@portableDD Great news! 🎉 Based on an automated assessment of this PR, the linked Wave issue(s) no longer count against your application limits.

You can now already apply to more issues while waiting for a review of this PR. Keep up the great work! 🚀

Learn more about application limits

@BigBen-7 BigBen-7 merged commit 504b6a7 into Lead-Studios:main Mar 26, 2026
1 of 4 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment