Skip to content

Conversation

@jerinphilip
Copy link
Owner

@jerinphilip jerinphilip commented Aug 17, 2023

Reports a few batch-level metrics (wps, occupancy) and some aggregates. Mostly to verify stdin read and batching is working favourably before starting more perf analysis.

# Download WNGT20 dataset into data/wngt20
bash scripts/download-wngt20.sh

# Use a python script to sort for better batching
# Sorted file is available as data/wngt20/sources.shuf.sorted
python3 scripts/order-sources-shuf.py data/wngt20/sources.shuf

TODO:

  • gprof
  • I want to try tracy, just for TIL goals. See if it's any useful.
  • cachegrind
  • Some mechanism to get speed per commit, so we can track speed aspect progress. Mandate to continuously improve, and absolutely do not degrade. perf.iree.dev?
  • Conceptual improvements - maybe prune samples as they complete during greedy-decoding to do fewer matmuls?
  • There's some trickling down possible from 512 -> 256 -> 128 bit SIMD registers in certain functions, not sure how much of a speedup this will give.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant