You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
User Story:
As a data engineer, I want to set up an internal batch scoring MBD API endpoint, so that I can process large datasets efficiently for the data team and provide results in a downloadable CSV file.
Acceptance Criteria:
GIVEN the internal API endpoint,
WHEN the data team submits a list of addresses with their API key,
THEN the API should provide an estimated processing time and a job ID, allow status checks via a separate endpoint, and return an S3 bucket link to download the CSV file with the results when the job is completed.
This is a continuation of the following stories: #2794, #2795, #2796 that have already done work on improving the MBD batch model
Story 4:
Save the data in the DB instead of S3 and then at the end of the run do a data dump to create S3 file
2 Django models separate
Cron job
(team will review during hangout meeting)
Story 5: Infra update(s)
Move DNS records for the data science models to be public names, but private IP addresses
Add VPN access to staging and review
3 pts
Story 6: Improve the Queries (Data science team)
Improve lambdas data gathering and analysis
Could create libraries specific to querying and processing
Optimize model logic
Story 7:
Move processing to shared resources or separate passport resource
Open Questions:
Notes/Assumptions:
The text was updated successfully, but these errors were encountered:
Jkd-eth
changed the title
MBD Batch Improvements
MBD Batch Improvements: Continued stories
Aug 20, 2024
User Story:
As a data engineer, I want to set up an internal batch scoring MBD API endpoint, so that I can process large datasets efficiently for the data team and provide results in a downloadable CSV file.
Acceptance Criteria:
GIVEN the internal API endpoint,
WHEN the data team submits a list of addresses with their API key,
THEN the API should provide an estimated processing time and a job ID, allow status checks via a separate endpoint, and return an S3 bucket link to download the CSV file with the results when the job is completed.
Tech Details:
This is a continuation of the following stories: #2794, #2795, #2796 that have already done work on improving the MBD batch model
Story 4:
2 Django models separate
Cron job
(team will review during hangout meeting)
Story 5: Infra update(s)
3 pts
Story 6: Improve the Queries (Data science team)
Story 7:
Open Questions:
Notes/Assumptions:
The text was updated successfully, but these errors were encountered: