mlir: Add Enzyme ops removal on structured control flow #2200

Pangoraw · 2024-12-18T22:28:19Z

TODO:

scf: support non-constant iterations Cache<f32> -> tensor<?xf32>.
scf: push/pop only once if a value is pushed multiple times.
Cache of tensor (nested for).
passes: add option in enzyme pass to try to remove enzyme ops after generating the function. This should help with higher order diff. ref MLIR: post optimization pipeline #2214.
scf: graph min-cut.

wsmoses · 2024-12-20T14:45:45Z

enzyme/Enzyme/MLIR/Passes/RemovalUtils.cpp

+  return mlir::enzyme::CacheInfo::batchType(mlir::ShapedType::kDynamic);
+}
+
+mlir::Type mlir::enzyme::CacheInfo::batchType(int64_t dim) {


so there is already an Enzyme Autodiff Type interface, which should have a method for batching (and if not that would probably be the right place for this)

This still requires changes in the tblgenerated derivative files. For example, createForwardModeTangent in MulFOpFwdDerivative could be altered like this: ``` LogicalResult createForwardModeTangent(Operation *op0, OpBuilder &builder, MGradientUtils *gutils) const { auto op = cast<arith::MulFOp>(op0); if (gutils->width != 1) { auto newop = gutils->getNewFromOriginal(op0); for (auto res : newop->getResults()) { res.setType(mlir::RankedTensorType::get({gutils->width}, res.getType())); } } gutils->eraseIfUnused(op); if (gutils->isConstantInstruction(op)) return success(); mlir::Value res = nullptr; if (!gutils->isConstantValue(op->getOperand(0))) { auto dif = gutils->invertPointerM(op->getOperand(0), builder); { mlir::Value itmp = ({ // Computing MulFOp auto fwdarg_0 = dif; dif.dump(); // TODO: gutils->makeBatched(...) auto fwdarg_1 = gutils->getNewFromOriginal(op->getOperand(1)); builder.create<arith::MulFOp>(op.getLoc(), fwdarg_0, fwdarg_1); }); itmp.dump(); if (!res) res = itmp; else { auto operandType = cast<AutoDiffTypeInterface>(res.getType()); res = operandType.createAddOp(builder, op.getLoc(), res, itmp); } } } if (!gutils->isConstantValue(op->getOperand(1))) { auto dif = gutils->invertPointerM(op->getOperand(1), builder); { mlir::Value itmp = ({ // Computing MulFOp auto fwdarg_0 = dif; dif.dump(); auto fwdarg_1 = gutils->getNewFromOriginal(op->getOperand(0)); builder.create<arith::MulFOp>(op.getLoc(), fwdarg_0, fwdarg_1); }); if (!res) res = itmp; else { auto operandType = cast<AutoDiffTypeInterface>(res.getType()); res = operandType.createAddOp(builder, op.getLoc(), res, itmp); } } } assert(res); gutils->setDiffe(op->getResult(0), res, builder); return success(); } ```

wsmoses · 2025-01-03T17:59:46Z

enzyme/Enzyme/MLIR/Implementations/SCFAutoDiffOpInterfaceImpl.cpp

-          if (!gutils->isConstantValue(prev))
-            gutils->addToDiffe(prev, post, builder);
+    auto numIters = getConstantNumberOfIterations(forOp);
+    Value inductionVariable; // [0, N[ counter


nit: presumably N]

wsmoses · 2025-01-03T18:01:26Z

enzyme/Enzyme/MLIR/Interfaces/CloneFunction.cpp

@@ -27,7 +27,11 @@ getFunctionTypeForClone(mlir::FunctionType FTy, DerivativeMode mode,
  for (auto &&[Ty, returnPrimal, returnShadow, activity] : llvm::zip(
           FTy.getResults(), returnPrimals, returnShadows, ReturnActivity)) {
    if (returnPrimal) {
-      RetTypes.push_back(Ty);
+      if (width != 1) {


this shouldn't be modified since width only applies to the derivative not primal return

wsmoses · 2025-01-03T18:02:25Z

enzyme/Enzyme/MLIR/Interfaces/CloneFunction.cpp

@@ -232,6 +240,11 @@ FunctionOpInterface CloneFunctionWithReturns(

  {
    auto &blk = NewF.getFunctionBody().front();
+    if (width != 1) {


similarly this seems wrong?

wsmoses · 2025-01-03T18:05:03Z

enzyme/Enzyme/MLIR/Implementations/BuiltinAutoDiffTypeInterfaceImpl.cpp

-    assert(width == 1 && "unsupported width != 1");
-    return self;
+  Type getShadowType(Type self, int64_t width) const {
+    return batchType(self, width);


in a separate different PR, it may be worthwhile switching getShadowType and the likes to take an ArrayRef<int64_t> indices to batch on (@jumerckx did something similar when adding batched differentiation broadcast earlier)

wsmoses

looks good, though there's some unrelated batch stuff here that probably shouldn't be here (maybe leftover from debugging)

Pangoraw and others added 2 commits December 18, 2024 23:23

mlir: Add Enzyme ops removal on structured control flow

cc88f11

format

6bf5d41

wsmoses reviewed Dec 20, 2024

View reviewed changes

wsmoses approved these changes Dec 20, 2024

View reviewed changes

jumerckx and others added 7 commits December 21, 2024 10:13

use AutoDiffTypeInterface for batching

d8efc38

remove

b603993

add test with unknown number of iterations

10186d5

don't push same value twice

4d36a14

tensor extract/insert

b869aab

Merge branch 'main' into remove-ops

cf998f8

Pangoraw marked this pull request as ready for review January 3, 2025 14:44

wsmoses reviewed Jan 3, 2025

View reviewed changes

wsmoses approved these changes Jan 3, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

mlir: Add Enzyme ops removal on structured control flow #2200

mlir: Add Enzyme ops removal on structured control flow #2200

Pangoraw commented Dec 18, 2024 •

edited

Loading

wsmoses Dec 20, 2024

wsmoses Jan 3, 2025

wsmoses Jan 3, 2025

wsmoses Jan 3, 2025

wsmoses Jan 3, 2025

wsmoses left a comment

mlir: Add Enzyme ops removal on structured control flow #2200

Are you sure you want to change the base?

mlir: Add Enzyme ops removal on structured control flow #2200

Conversation

Pangoraw commented Dec 18, 2024 • edited Loading

wsmoses Dec 20, 2024

Choose a reason for hiding this comment

wsmoses Jan 3, 2025

Choose a reason for hiding this comment

wsmoses Jan 3, 2025

Choose a reason for hiding this comment

wsmoses Jan 3, 2025

Choose a reason for hiding this comment

wsmoses Jan 3, 2025

Choose a reason for hiding this comment

wsmoses left a comment

Choose a reason for hiding this comment

Pangoraw commented Dec 18, 2024 •

edited

Loading