-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Spot precision option #2
base: master
Are you sure you want to change the base?
Conversation
spot-ml/SPARKCONF.md
Outdated
|
||
oni-ml main component uses Spark and Spark SQL to analyze network events and produce a list of least probable events | ||
or most suspicious. | ||
spot-ml main component uses Spark and Spark SQL to analyze network events and produce a list of least probable events |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
produce a list of those considered the most unlikely.
spot-ml/SPARKCONF.md
Outdated
|
||
As a first step, users need to decide whether they want to change from 64 bit floating point probabilities to 32 bit floating point probabilities; if users decide to change from 64 to 32 bit, the document probability distribution lookup table will be half the size and more easily broadcasted. | ||
|
||
If users want to cut the payload on half, they should set precision option to 32. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If users want to cut payload memory consumption roughly in half, they should set the precision option to 32
* limitations under the License. | ||
*/ | ||
|
||
package org.apache.spot.utilities.transformation |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we can just have the utilities package; the things in transformation don't really interact with each other much
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, it was more for classification.
* | ||
*/ | ||
|
||
sealed trait PrecisionUtility extends Serializable { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
perhaps more clear : FloatingPointPrecisionReducer
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But then we have toDoubles()
method. It reduces and then increases back. What about FloatingPointPrecision or FloatingPointPrecisionUtility
* PrecisionUtility will transform a number from Double to Float if precision option is set to 32 bit, | ||
* if default or 64 bit is selected, it will just return the same number type Double. | ||
* | ||
* This abstract class permits the execution of a single path during the entire analysis. Instead of checking what |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
no need to provide "why" in the scaladoc
|
||
/** | ||
* Converts a number into the precision type; it can be Float (32) or Double (64). | ||
* For the Double implementation it will just return the same value without any transformation. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
strike this; it should go with the implementation of the Double version
def toDoubles[A <% Traversable[TargetType], B <% Traversable[Double]](targetTypeIterable: A): B | ||
|
||
/** | ||
* Converts a DataFrame column from Seq[Double] to a Seq[TargetType]. If the TargetType is Double, it will |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
comment about what it will do in the case of Double should go with the Double implementation
|
||
/** | ||
* PrecisionUtility implementation for Double. | ||
* Users that don't want to reduce the workload, can continue working with Doubles. This implementation will receive |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
no need to explain use case beyond "no reduction in precision in done"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I really prefer this version of the precision utility that better uses the scala type system, thanks for doing it :)
approved but for a few comment/naming things... to be honest, I don't like the "transformation" subpackage of utilities, but that is a trivial change
… Float and reduce the payload when joining original data and LDA results (document probabilities). Added parameters for using Doubles (64 bit) or Float (32 bit), added parameter for spark autoBroadcastJoinThreshold, this parameter will change the default value for the job execution in runtime. Added new properties to spot.conf file, one for the scaling option and one more for the autoBroadcastJoinThreshold. Added unit test for new functionality. Slightly changed the package structure for utilities. Moved all the transformation utilities to a new package called transformation.
Fixed a few markdown issues plus removed oni references and replaced with spot. Added documentation for 2 new options, SCALING_OPTION and SPK_AUTO_BRDCST_JOIN_THR.
Fixed typos.
…ilities. Changed command line option from scaling to precision.
Changed name of PrecisionUtility to FloatingPointPrecisionUtility. Minor documentation fixes.
db12446
to
876d170
Compare
No description provided.