feat: resolve issues #157 #159 #161 #162 — search locations, item recommender, incremental ETL, row validation#170
Merged
BigBen-7 merged 2 commits intoLead-Studios:mainfrom Mar 26, 2026
Conversation
Lead-Studios#162 Issue Lead-Studios#157 — expand location detection beyond hardcoded Nigerian cities - Add KNOWN_LOCATIONS env var (comma-separated) to Settings with 20 default cities - Replace hardcoded list in extract_keywords() with the configurable setting - Detect unknown cities via preposition patterns and return them as fuzzy_locations - filter_events_by_keywords() uses fuzzy substring match when location is unknown - Add city field to every mock event and add two new extended-city events (Owerri, Warri) - Add tests/test_search_location.py: known-city exact match, unknown-city fuzzy match, no-location query Issue Lead-Studios#159 — replace user-based collaborative filter with item-based similarity - Create src/recommender.py with build_item_similarity_matrix() (pure-Python cosine similarity) - Implement get_item_recommendations() with cold-start fallback (most popular events) - Add get_user_events_from_db() stub backed by SQLAlchemy, falls back to empty dict - Update /recommend-events endpoint to use the new item-based recommender - Update test_recommend.py: unknown user now returns 200 cold-start recs instead of 404 - Add tests/test_recommender.py: similarity matrix, correct top-3, cold-start paths Issue Lead-Studios#161 — incremental ETL extract using a cursor - extract_events_and_sales() accepts optional since: str param - ?since=<ISO> is forwarded to both /events and /ticket-sales when provided - run_etl_once() creates etl_run_log table, reads last successful finished_at cursor, passes it to extract, and writes a new log row (status + rejected_count) on completion - Cursor is only advanced on success; failed runs leave the previous cursor intact - Add tests/test_etl_incremental.py: no-cursor first run, cursor-forwarded run, failed run Issue Lead-Studios#162 — ETL data validation step between transform and load - Add validate_rows(event_summary_rows, daily_rows) → (valid_ev, valid_daily, rejected_count) - Rejects event rows with empty/None event_id, negative total_tickets, negative total_revenue - Rejects daily rows with empty event_id or sale_date more than 1 day in the future - Every rejection is logged as a warning for traceability - run_etl_once() calls validate_rows before load_postgres; rejected_count stored in run log - Add tests/test_etl_validation.py covering all rejection rules and mixed batches Closes Lead-Studios#157 Closes Lead-Studios#159 Closes Lead-Studios#161 Closes Lead-Studios#162
|
@portableDD Great news! 🎉 Based on an automated assessment of this PR, the linked Wave issue(s) no longer count against your application limits. You can now already apply to more issues while waiting for a review of this PR. Keep up the great work! 🚀 |
BigBen-7
approved these changes
Mar 26, 2026
BigBen-7
approved these changes
Mar 26, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
This PR resolves four related enhancement issues across the search, recommendation, and ETL subsystems.
Issue #157 — Expand location detection beyond hardcoded cities
KNOWN_LOCATIONSenv var (comma-separated) toSettingswith 20 default Nigerian cities (original 10 + Owerri, Warri, Uyo, Akure, Ilorin, Sokoto, Zaria, Maiduguri, Asaba, Nnewi)extract_keywords()now loads the city list fromKNOWN_LOCATIONSinstead of a hardcoded listfuzzy_locationsand substring-matched againstevent.location/event.cityinfilter_events_by_keywords()cityfield to every mock event and added two new extended-city events (Owerri, Warri)tests/test_search_location.py— known-city exact match, unknown-city fuzzy match, no-location query,KNOWN_LOCATIONSenv overrideIssue #159 — Replace user-based collaborative filter with item-based similarity
src/recommender.pywithbuild_item_similarity_matrix()(pure-Python cosine similarity, no numpy) andget_item_recommendations()with cold-start fallbackget_user_events_from_db()stub queriesuser_event_purchasesvia SQLAlchemy, falls back to empty dict when table is absent/recommend-eventsendpoint uses the new item-based algorithm;mock_user_eventsremains as a fallback when the DB is unavailabletests/test_recommend.py: unknown user now returns200with cold-start recommendations instead of404tests/test_recommender.py— similarity matrix correctness, symmetry, top-3 output, cold-start paths, edge casesIssue #161 — Incremental ETL extract using a cursor
extract_events_and_sales(since=None)accepts an optional ISO-8601 cursor; when provided,?since=<ISO>is forwarded to both/eventsand/ticket-salesrun_etl_once()creates theetl_run_logtable (if it does not exist), reads the last successfulfinished_atas the cursor, and writes a new log row after every run (success or failure)tests/test_etl_incremental.py— no-cursor first run, cursor-forwarded subsequent run, failed-run cursor not advancedIssue #162 — ETL data validation step between transform and load
validate_rows(event_summary_rows, daily_rows)returns(valid_ev, valid_daily, rejected_count):Noneevent_id, negativetotal_tickets, or negativetotal_revenueevent_idorsale_datemore than 1 day in the futureWARNINGfor full traceability — nothing is silently droppedrun_etl_once()callsvalidate_rowsbeforeload_postgres;rejected_countis stored inetl_run_logtests/test_etl_validation.py— all rejection rules, boundary dates, mixed batches, empty inputsCloses #157
Closes #159
Closes #161
Closes #162