scalable batch inference for millions of time series (SKU sales, 1000-day horizon) on SageMaker #368

skwskwskwskw · 2025-11-06T02:59:36Z

skwskwskwskw
Nov 6, 2025

Request:

Context
I need to run forecast inference for millions of SKU-level time series (SKU × location/product hierarchy). The forecast horizon is 1,000 days. I’m looking for guidance on a cost-efficient and operationally sound approach on Amazon SageMaker.

Questions

Throughput & cost efficiency
Naïve per-series loops won’t finish in a reasonable time. What’s the most cost-effective pattern on SageMaker for large-scale batch inference here (e.g., Batch Transform vs. distributed inference on Processing jobs, SageMaker Inference Pipelines, Async endpoints, or other recommended patterns)?
High cardinality (SKU × hierarchy × location)
The number of series explodes due to product hierarchy and geography. Any best practices to partition/shard the workload and optimize parallelism (data layout in S3, instance types/count, I/O considerations) to keep runtime and cost under control?
Method comparison / benchmarks
My current solution uses Nixtla’s mlforecast with LightGBM (mlforecast + lightgbm). Are there benchmarks or case studies comparing this setup with your recommended approach (accuracy, latency, and cost) at a similar scale?

Environment & constraints (if helpful)

Forecast horizon: 1,000 days
Scale: millions of independent series (SKU × location)
Current stack: mlforecast + lightgbm
Platform: Amazon SageMaker (flexible on instance families; open to spot/on-demand mix)
Goal: Minimize $/forecast while keeping wall-clock time reasonable and operational complexity manageable

What I’m hoping for

Recommended architecture pattern(s) on SageMaker for this use case
Configuration tips (instance types, parallelism, Batch Transform vs. Processing, Async endpoints, data partitioning)
References to published benchmarks or example repos comparing mlforecast-LightGBM to your method at high cardinality

Thanks in advance for any pointers, designs, or examples!

abdulfatir · 2025-11-08T22:13:51Z

abdulfatir
Nov 8, 2025
Maintainer

@skwskwskwskw I think any of the options that you propose should be good enough because the Chronos-2 model is extremely fast. My recommendation would be to deploy the model on Sagemaker Jumpstart as shown in this notebook. This would likely be enough. You could also use the jumpstart model for a batch-transform job.

For comparisons to other models, you can check fev-bench and GIFT-Eval.

I also think that a long forecast horizon of 1000 days is likely not a great idea. You would be better off aggregating the data and forecasting at a coarser granularity (weekly or monthly) instead of using such a long prediction horizon.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

scalable batch inference for millions of time series (SKU sales, 1000-day horizon) on SageMaker #368

Uh oh!

{{title}}

Uh oh!

Replies: 1 comment

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

scalable batch inference for millions of time series (SKU sales, 1000-day horizon) on SageMaker #368

Uh oh!

skwskwskwskw Nov 6, 2025

Request:

Replies: 1 comment

Uh oh!

abdulfatir Nov 8, 2025 Maintainer

skwskwskwskw
Nov 6, 2025

abdulfatir
Nov 8, 2025
Maintainer