You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
help="Allow to use custom code for the modeling hosted in the model repository. This option should only be set for repositories you trust and in which you have read the code, as it will execute on your local machine arbitrary code present in the model repository.",
60
75
)
76
+
optional_group.add_argument(
77
+
"--compiler_workdir",
78
+
type=Path,
79
+
help="Path indicating the directory where to store intermediary files generated by Neuronx compiler.",
80
+
)
81
+
optional_group.add_argument(
82
+
"--disable-weights-neff-inline",
83
+
action="store_true",
84
+
help="Whether to disable the weights / neff graph inline. You can only replace weights of neuron-compiled models when the weights-neff inlining has been disabled during the compilation.",
Parse the level of optimization the compiler should perform. If not specified apply `O2`(the best balance between model performance and compile time).
153
+
(NEURONX ONLY) Parse the level of optimization the compiler should perform. If not specified apply `O2`(the best balance between model performance and compile time).
The Neuron configuration associated with the exported model.
586
598
output (`Path`):
587
599
Directory to store the exported Neuron model.
600
+
compiler_workdir (`Optional[Path]`, defaults to `None`):
601
+
The directory used by neuronx-cc, where you can find intermediary outputs (neff, weight, hlo...).
602
+
inline_weights_to_neff (`bool`, defaults to `True`):
603
+
Whether to inline the weights to the neff graph. If set to False, weights will be seperated from the neff.
588
604
auto_cast (`Optional[str]`, defaults to `None`):
589
605
Whether to cast operations from FP32 to lower precision to speed up the inference. Can be `None`, `"matmul"` or `"all"`, you should use `None` to disable any auto-casting, use `"matmul"` to cast FP32 matrix multiplication operations, and use `"all"` to cast all FP32 operations.
0 commit comments