@@ -4,28 +4,32 @@ This roadmap presents an overview of the features we are currently planning to
44implement. Please note that this is a living document that will evolve as
55priorities grow and shift.
66
7- ### version 0.4.0
7+ ## version 0.3.1
88
9- This is the list of features that we want to have implemented by the next version.
9+ * Support for C++ templated kernels through NVRTC and template parameter tuning
10+ * Document a vocabulary of reserved/special tunable parameter names
1011
11- * Extend Fortran support, no more warnings on data types or missing block size parameter etc.
12- * Turn the C backend into a more general compiler backend
13- * A get_parameterized_kernel_source function to return the parameterized kernel source for inspection
14- * Function to instrument source files with parameter values after tuning
15- * Function to generate wrapper kernels for directly calling device functions
16-
17- ### version 1.0.0
12+ ## version 0.3.2
13+
14+ * Support for specifying metrics, using tunable parameters, results, and/or a function
15+ * Allow strategies to tune for a metric other than time
16+
17+ ## version 0.4.0
18+
19+ * Multi-objective optimization
20+
21+ ## version 1.0.0
1822
1923These functions are to be implemented by version 1.0.0, but may already be
2024implemented in earlier versions.
2125
22- * Functionality for including auto-tuned kernels in applications
2326 * Tuning kernels in parallel on a set of nodes in a GPU cluster
27+ * Functionality for including auto-tuned kernels in applications
2428
25- ### Low priority
29+ ## Wish list
2630
2731These are the things that we would like to implement, but we currently have no
28- demand for it. If you are interested in any of these, let us know!
32+ immediate demand for it. If you are interested in any of these, let us know!
2933
3034 * Option to set dynamically allocated shared memory for CUDA backend
3135 * Option to set function that computes search space restriction, instead of a list of strings
@@ -35,5 +39,8 @@ demand for it. If you are interested in any of these, let us know!
3539 * Example that tunes a kernel using thread block re-indexing
3640 * Example CUDA host code that uses runtime compilation
3741 * A test_kernel function to perform parameterized testing without tuning
38-
39-
42+ * Extend Fortran support, no more warnings on data types or missing block size parameter etc.
43+ * Turn the C backend into a more general compiler backend
44+ * A get_parameterized_kernel_source function to return the parameterized kernel source for inspection
45+ * Function to generate wrapper kernels for directly calling/testing device functions
46+ * Function to instrument source files with parameter values after tuning
0 commit comments