New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

Sign up for GitHub

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Jump to bottom

Spot precision option #2

Open

rabarona wants to merge 8 commits into master from SPOT-precision_option

Owner

rabarona commented May 30, 2017

No description provided.

NathanSegerlind reviewed

View reviewed changes

spot-ml/SPARKCONF.md Outdated

-              oni-ml main component uses Spark and Spark SQL to analyze network events and produce a list of least probable events
-              or most suspicious.
+              spot-ml main component uses Spark and Spark SQL to analyze network events and produce a list of least probable events

NathanSegerlind May 30, 2017

produce a list of those considered the most unlikely.

NathanSegerlind reviewed

View reviewed changes

spot-ml/SPARKCONF.md Outdated


		As a first step, users need to decide whether they want to change from 64 bit floating point probabilities to 32 bit floating point probabilities; if users decide to change from 64 to 32 bit, the document probability distribution lookup table will be half the size and more easily broadcasted.

		If users want to cut the payload on half, they should set precision option to 32.

NathanSegerlind May 30, 2017

If users want to cut payload memory consumption roughly in half, they should set the precision option to 32

NathanSegerlind reviewed

View reviewed changes

spot-ml/src/main/scala/org/apache/spot/utilities/transformation/PrecisionUtility.scala Outdated

+               * limitations under the License.
+               */
+              package org.apache.spot.utilities.transformation

NathanSegerlind May 30, 2017

I think we can just have the utilities package; the things in transformation don't really interact with each other much

Owner Author

rabarona May 30, 2017

Yeah, it was more for classification.

NathanSegerlind reviewed

View reviewed changes

spot-ml/src/main/scala/org/apache/spot/utilities/transformation/PrecisionUtility.scala Outdated

+                *
+                */
+              sealed trait PrecisionUtility extends Serializable {

NathanSegerlind May 30, 2017

perhaps more clear : FloatingPointPrecisionReducer

Owner Author

rabarona May 30, 2017

But then we have toDoubles() method. It reduces and then increases back. What about FloatingPointPrecision or FloatingPointPrecisionUtility

NathanSegerlind reviewed

View reviewed changes

spot-ml/src/main/scala/org/apache/spot/utilities/transformation/PrecisionUtility.scala Outdated

+                * PrecisionUtility will transform a number from Double to Float if precision option is set to 32 bit,
+                * if default or 64 bit is selected, it will just return the same number type Double.
+                *
+                * This abstract class permits the execution of a single path during the entire analysis. Instead of checking what

NathanSegerlind May 30, 2017

no need to provide "why" in the scaladoc

NathanSegerlind reviewed

View reviewed changes

spot-ml/src/main/scala/org/apache/spot/utilities/transformation/PrecisionUtility.scala Outdated

+                /**
+                  * Converts a number into the precision type; it can be Float (32) or Double (64).
+                  * For the Double implementation it will just return the same value without any transformation.

NathanSegerlind May 30, 2017

strike this; it should go with the implementation of the Double version

NathanSegerlind reviewed

View reviewed changes

spot-ml/src/main/scala/org/apache/spot/utilities/transformation/PrecisionUtility.scala Outdated

+                def toDoubles[A <% Traversable[TargetType], B <% Traversable[Double]](targetTypeIterable: A): B
+                /**
+                  * Converts a DataFrame column from Seq[Double] to a Seq[TargetType]. If the TargetType is Double, it will

NathanSegerlind May 30, 2017

comment about what it will do in the case of Double should go with the Double implementation

NathanSegerlind reviewed

View reviewed changes

spot-ml/src/main/scala/org/apache/spot/utilities/transformation/PrecisionUtility.scala Outdated

+              /**
+                * PrecisionUtility implementation for Double.
+                * Users that don't want to reduce the workload, can continue working with Doubles. This implementation will receive

NathanSegerlind May 30, 2017

no need to explain use case beyond "no reduction in precision in done"

NathanSegerlind approved these changes

View reviewed changes

NathanSegerlind left a comment

I really prefer this version of the precision utility that better uses the scala type system, thanks for doing it :)

approved but for a few comment/naming things... to be honest, I don't like the "transformation" subpackage of utilities, but that is a trivial change

Ricardo Barona added 7 commits

May 31, 2017 12:34


          Added scaling option to convert document probabilities from Double to…

803fe79

… Float and reduce the payload when joining original data and LDA results (document probabilities).

Added parameters for using Doubles (64 bit) or Float (32 bit), added parameter for spark autoBroadcastJoinThreshold, this parameter will change the default value for the job execution in runtime.
Added new properties to spot.conf file, one for the scaling option and one more for the autoBroadcastJoinThreshold.
Added unit test for new functionality.
Slightly changed the package structure for utilities. Moved all the transformation utilities to a new package called transformation.


          Fixed oni references to spot.

560c33c

Fixed a few markdown issues plus removed oni references and replaced with spot.
Added documentation for 2 new options, SCALING_OPTION and SPK_AUTO_BRDCST_JOIN_THR.


          Added description for SCALING_OPTION

3fb804c

Fixed typos.


          Changed precision utility to be more generic and not proper to probab…

048a873

…ilities.

Changed command line option from scaling to precision.


          Updated PRECISION section.

23201ce


          updated spot.conf, replaced SCALING_OPTION with PRECISION param

a992bf0


          Reverting utilities.transformation package.

876d170

Changed name of PrecisionUtility to FloatingPointPrecisionUtility.
Minor documentation fixes.

rabarona force-pushed the SPOT-precision_option branch from db12446 to 876d170 Compare

May 31, 2017 18:04


          Added missing import after merging changes with upstream on ProxyWord…

0024b28

…Creation.

rabarona pushed a commit that referenced this pull request


          Merge pull request #2 from EverLoSa/spot-score-reset

9399dde

Spot score reset

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet