feat: resolve issues #157 #159 #161 #162 — search locations, item recommender, incremental ETL, row validation by portableDD · Pull Request #170 · Lead-Studios/veritix-python

portableDD · 2026-03-26T20:17:04Z

Summary

This PR resolves four related enhancement issues across the search, recommendation, and ETL subsystems.

Issue #157 — Expand location detection beyond hardcoded cities

Added KNOWN_LOCATIONS env var (comma-separated) to Settings with 20 default Nigerian cities (original 10 + Owerri, Warri, Uyo, Akure, Ilorin, Sokoto, Zaria, Maiduguri, Asaba, Nnewi)
extract_keywords() now loads the city list from KNOWN_LOCATIONS instead of a hardcoded list
Unknown cities (after preposition keywords like "in", "at", "near") are captured as fuzzy_locations and substring-matched against event.location / event.city in filter_events_by_keywords()
Added city field to every mock event and added two new extended-city events (Owerri, Warri)
Tests: tests/test_search_location.py — known-city exact match, unknown-city fuzzy match, no-location query, KNOWN_LOCATIONS env override

Issue #159 — Replace user-based collaborative filter with item-based similarity

New src/recommender.py with build_item_similarity_matrix() (pure-Python cosine similarity, no numpy) and get_item_recommendations() with cold-start fallback
Cold-start (user has no history or is unknown): returns the 3 most popular events by purchase count
get_user_events_from_db() stub queries user_event_purchases via SQLAlchemy, falls back to empty dict when table is absent
/recommend-events endpoint uses the new item-based algorithm; mock_user_events remains as a fallback when the DB is unavailable
Updated tests/test_recommend.py: unknown user now returns 200 with cold-start recommendations instead of 404
Tests: tests/test_recommender.py — similarity matrix correctness, symmetry, top-3 output, cold-start paths, edge cases

Issue #161 — Incremental ETL extract using a cursor

extract_events_and_sales(since=None) accepts an optional ISO-8601 cursor; when provided, ?since=<ISO> is forwarded to both /events and /ticket-sales
run_etl_once() creates the etl_run_log table (if it does not exist), reads the last successful finished_at as the cursor, and writes a new log row after every run (success or failure)
Cursor is only advanced on a successful run; a failed run leaves the previous cursor intact so the next retry re-fetches the same window
Tests: tests/test_etl_incremental.py — no-cursor first run, cursor-forwarded subsequent run, failed-run cursor not advanced

Issue #162 — ETL data validation step between transform and load

validate_rows(event_summary_rows, daily_rows) returns (valid_ev, valid_daily, rejected_count):
- Rejects event rows with empty/None event_id, negative total_tickets, or negative total_revenue
- Rejects daily rows with empty event_id or sale_date more than 1 day in the future
- Every rejected row is logged as a WARNING for full traceability — nothing is silently dropped
run_etl_once() calls validate_rows before load_postgres; rejected_count is stored in etl_run_log
Tests: tests/test_etl_validation.py — all rejection rules, boundary dates, mixed batches, empty inputs

Closes #157
Closes #159
Closes #161
Closes #162

Lead-Studios#162 Issue Lead-Studios#157 — expand location detection beyond hardcoded Nigerian cities - Add KNOWN_LOCATIONS env var (comma-separated) to Settings with 20 default cities - Replace hardcoded list in extract_keywords() with the configurable setting - Detect unknown cities via preposition patterns and return them as fuzzy_locations - filter_events_by_keywords() uses fuzzy substring match when location is unknown - Add city field to every mock event and add two new extended-city events (Owerri, Warri) - Add tests/test_search_location.py: known-city exact match, unknown-city fuzzy match, no-location query Issue Lead-Studios#159 — replace user-based collaborative filter with item-based similarity - Create src/recommender.py with build_item_similarity_matrix() (pure-Python cosine similarity) - Implement get_item_recommendations() with cold-start fallback (most popular events) - Add get_user_events_from_db() stub backed by SQLAlchemy, falls back to empty dict - Update /recommend-events endpoint to use the new item-based recommender - Update test_recommend.py: unknown user now returns 200 cold-start recs instead of 404 - Add tests/test_recommender.py: similarity matrix, correct top-3, cold-start paths Issue Lead-Studios#161 — incremental ETL extract using a cursor - extract_events_and_sales() accepts optional since: str param - ?since=<ISO> is forwarded to both /events and /ticket-sales when provided - run_etl_once() creates etl_run_log table, reads last successful finished_at cursor, passes it to extract, and writes a new log row (status + rejected_count) on completion - Cursor is only advanced on success; failed runs leave the previous cursor intact - Add tests/test_etl_incremental.py: no-cursor first run, cursor-forwarded run, failed run Issue Lead-Studios#162 — ETL data validation step between transform and load - Add validate_rows(event_summary_rows, daily_rows) → (valid_ev, valid_daily, rejected_count) - Rejects event rows with empty/None event_id, negative total_tickets, negative total_revenue - Rejects daily rows with empty event_id or sale_date more than 1 day in the future - Every rejection is logged as a warning for traceability - run_etl_once() calls validate_rows before load_postgres; rejected_count stored in run log - Add tests/test_etl_validation.py covering all rejection rules and mixed batches Closes Lead-Studios#157 Closes Lead-Studios#159 Closes Lead-Studios#161 Closes Lead-Studios#162

drips-wave · 2026-03-26T20:17:17Z

@portableDD Great news! 🎉 Based on an automated assessment of this PR, the linked Wave issue(s) no longer count against your application limits.

You can now already apply to more issues while waiting for a review of this PR. Keep up the great work! 🚀

Learn more about application limits

BigBen-7 approved these changes Mar 26, 2026

View reviewed changes

Merge branch 'main' into feat/issues-157-159-161-162

7662575

BigBen-7 approved these changes Mar 26, 2026

View reviewed changes

BigBen-7 merged commit 504b6a7 into Lead-Studios:main Mar 26, 2026
1 of 4 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: resolve issues #157 #159 #161 #162 — search locations, item recommender, incremental ETL, row validation#170

feat: resolve issues #157 #159 #161 #162 — search locations, item recommender, incremental ETL, row validation#170
BigBen-7 merged 2 commits intoLead-Studios:mainfrom
portableDD:feat/issues-157-159-161-162

portableDD commented Mar 26, 2026

Uh oh!

drips-wave bot commented Mar 26, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

portableDD commented Mar 26, 2026

Summary

Issue #157 — Expand location detection beyond hardcoded cities

Issue #159 — Replace user-based collaborative filter with item-based similarity

Issue #161 — Incremental ETL extract using a cursor

Issue #162 — ETL data validation step between transform and load

Uh oh!

drips-wave bot commented Mar 26, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants