Skip to content

Commit

Permalink
easier intgemm cmdparsing
Browse files Browse the repository at this point in the history
  • Loading branch information
XapaJIaMnu committed Jul 10, 2020
1 parent 5b93079 commit 22842e1
Show file tree
Hide file tree
Showing 3 changed files with 40 additions and 17 deletions.
14 changes: 3 additions & 11 deletions src/common/config_parser.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -629,18 +629,10 @@ void ConfigParser::addOptionsTranslation(cli::CLIWrapper& cli) {
addSuboptionsDevices(cli);
addSuboptionsBatching(cli);

cli.add<bool>("--optimize",
"Optimize speed aggressively sacrificing memory or precision by using 16bit integer CPU multiplication. Only available on CPU");
cli.add<bool>("--optimize8",
"Optimize speed even more aggressively sacrificing memory or precision by using 8bit integer CPU multiplication. Only available on CPU");
cli.add<bool>("--intgemm-shifted",
"Use a shifted GEMM implementation. Only available with intgemm8.");
cli.add<bool>("--intgemm-shifted-all",
"Use a shifted GEMM implementation even for operations without biases. Only available with intgemm8.");
cli.add<std::string>("--gemm-precision",
"Use lower precision for the GEMM operations only. Supported values: float32, int16, int8, int8shift, int8shiftAlpha, int8shiftAll, int8shiftAlphaAll", "float32");
cli.add<bool>("--dump-quantmult",
"Dump the quantization multipliers during an avarage run.");
cli.add<bool>("--use-precomputed-alphas",
"Use precomputed alphas for bias calculation.");
"Dump the quantization multipliers during an avarage run. To be used to compute alphas for ---gemm-precision int8shiftAlpha or int8shiftAlphaAll.");
cli.add<bool>("--use-legacy-batching",
"Use legacy codepath with a for loop of cblas_sgemm, instead of cblas_sgemm_batched.");
cli.add<bool>("--skip-cost",
Expand Down
36 changes: 36 additions & 0 deletions src/tensors/backend.h
Original file line number Diff line number Diff line change
Expand Up @@ -27,6 +27,42 @@ class Backend {
virtual void configureDevice(Ptr<Options const> options) = 0;
virtual void synchronize() = 0;

virtual void configureIntgemm(Ptr<Options const> options) {
std::string gemmPrecision = options->get<std::string>("gemm-precision");
bool dumpQuantMults = options->get<bool>("dump-quantmult");
if (dumpQuantMults) {
setOptimized8(true);
setShifted(true);
setShiftedAll(true);
setDumpQuantMult(true);
//float32, int16, int8, int8shift, int8shiftAlpha, int8shiftAll, int8shiftAlphaAll
} else if (gemmPrecision == "float32") {
// Default case, all variables are false. Do nothing
} else if (gemmPrecision == "int16") {
setOptimized(true);
} else if (gemmPrecision == "int8") {
setOptimized8(true);
} else if (gemmPrecision == "int8shift") {
setOptimized8(true);
setShifted(true);
} else if (gemmPrecision == "int8shiftAlpha") {
setOptimized8(true);
setShifted(true);
setPrecomputedAlpha(true);
} else if (gemmPrecision == "int8shiftAll") {
setOptimized8(true);
setShifted(true);
setShiftedAll(true);
} else if (gemmPrecision == "int8shiftAlphaAll") {
setOptimized8(true);
setShifted(true);
setShiftedAll(true);
setPrecomputedAlpha(true);
} else {
ABORT("Unknown option {} for command line parameter gemm-precision.", gemmPrecision);
}
}

virtual void setClip(float clipValue) { clipValue_ = clipValue; }
float getClip() { return clipValue_; }

Expand Down
7 changes: 1 addition & 6 deletions src/tensors/cpu/backend.h
Original file line number Diff line number Diff line change
Expand Up @@ -25,13 +25,8 @@ class Backend : public marian::Backend {
void setDevice() override {}

void configureDevice(Ptr<Options const> options) override {
configureIntgemm(options);
setClip(options->get<float>("clip-gemm"));
setOptimized(options->get<bool>("optimize"));
setOptimized8(options->get<bool>("optimize8"));
setShifted(options->get<bool>("intgemm-shifted"));
setShiftedAll(options->get<bool>("intgemm-shifted-all"));
setDumpQuantMult(options->get<bool>("dump-quantmult"));
setPrecomputedAlpha(options->get<bool>("use-precomputed-alphas"));
setLegacyBatchedGemm(options->get<bool>("use-legacy-batching"));
}
void synchronize() override {}
Expand Down

11 comments on commit 22842e1

@ugermann
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A few things:

  1. As far as I understand, --optimize is currently a command line option in a released version of Marian. Removal of that command line option is a change of the API (well, the CLI, but the idea is the same) and would thus mandate a change in the major version of Marian (due to semantic versioning that Marian has adopted). I would therefore advise to leave that option in (e.g., as an alias for --gemm-precision int16); simply for backwards compatibility.

  2. There's no gemm with float16?

  3. I assume that all the '... shift ...' options relate to the bias. Would it make sense to have --gem-precision {float32 | [float16 |] int16 | int8} and --gemm-int8-shift { none | shift | alpha | all | alpha-all }? If gemm-shift has no effect in certain situations, that's fine; it's options that contradict each other that I don't like.

  4. The whole int-gemm thing should be documented somewhere. Most users will have no idea what it's all about and what all these options mean. Ultimately, this should be in the Marian documentation, but that doesn't seem to have been updated in quite some time. For the time being, the Wiki should do.

@XapaJIaMnu
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ugermann

  1. This will change in marian master, eventually. Marcin doesn't like the uninformative --optimize. In my pull request for intgemm, I have already proposed a change similar to this interface. There's no alpha stuff in that pull request so it makes things a bit simpler. Marcin is supposed to review it at some point.
  2. The --fp16 mode is pervasive, not just for the GEMM operations. Also, it's only available on GPU. The optimize stuff is quantization before performing gemm and requantization after.
  3. Yes, the shifting is part of the bias, but it's only available for int8 instructions. This is why I have it on the same options. I'm not sure if --gemm-int8-shift { none | shift | alpha | all | alpha-all } is more comprehensive.
  4. Yes, I agree. Is this the wiki you're referring to? https://github.com/browsermt/coordination/wiki I will explain what the options are there.

@ugermann
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

re 1.: Well then, my biggest concern a code reviewer insisting that --optimize stay in there. If --optimize is not being loved, my point regarding that one is sort-of mute (notwithstanding the semantic versioning issue).
re 3.: I have no strong opinion on that; fine with having the full slew of options as possible values to a single string-valued parameter.
re 4.: I was thinking about the marian-dev wiki https://github.com/marian-nmt/marian-dev/wiki. This is Marian-related, not Bergamot-related.

@XapaJIaMnu
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

re 4: This won't be accepted in the marian wiki until intgemm makes into master, and even then, that would be without the alpha stuff. I propose a wiki entry in the bergamot wiki, which we will then copy + the single string format with the asterisk it might have to change based on how Marcin decides to treat the command line switches.

@ugermann
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oi the joys of being a Marian contributor! Remind me: why again are we doing this?

I don't really care much about where it is as long as it's somewhere reasonable, but bergamot/coordination doesn't strike me as the right place. I don't mind if it's on on the bergamot/mts wiki, or on the wiki of the fork that you created on your user account. It could also just be a markdown file in ./docs in marian-dev in whatever branch we are on these days.

What I do care about is some form of explanation of what all these things mean, because to the uninitiated, the intgemm stuff will require a considerable amount of explanation.

You know what: you could also create a blog post (wherever, as long as it's public and somewhat permanent) where you explain things. That way you will get at least the recognition for implementing and documenting this stuff. So basically I want something where the help message would say: see .... for details. I kind-of get the intgemm stuff (do integer computations instead of floats), but "precomputed alpha" is something that I'd need to research in more detail.

@XapaJIaMnu
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this work for you? If it's not clear, i can expand: https://github.com/browsermt/coordination/wiki/Low-precision-GEMM-and-Intgemm

@XapaJIaMnu
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As for, why are we doing this, because the in master implementation of 8bit CPU GEMM (fbgemm) doesn't allow for architecture agnostic binary format and also requires prior binarzation AND is slower.

@ugermann
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As I said earlier, I'm not sure if https://github.com/browsermt/coordination/wiki is the right place, but "anywhere" is much, much better than "nowhere", so I'm more than happy to go along with this for the time being. It should be reasonably easy to move documentation to the right place eventually.

So thanks for the documentation! It greatly helps me to understand what's going on. I may comment in more detail later, but for the time being, this is a very good start!

@XapaJIaMnu
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, let me know if more things are necessary, I can add points and clarifications. It doesn't cost that much more effort after the initial effort.

@ugermann
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As for, why are we doing this, because the in master implementation of 8bit CPU GEMM (fbgemm) doesn't allow for architecture agnostic binary format and also requires prior binarzation AND is slower.

"Doing this" referred to contribution to Marian in general and was a sigh of despair with respect to getting pull requests into master. ;-)

Let me know when this is ready to be pulled into the current rest server branch.

@ugermann
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note to the reader: back to this thread: browsermt/mts#1

Please sign in to comment.