New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

Sign up for GitHub

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Jump to bottom

Inter rep #2256

Closed

Angelica-Schell wants to merge 15 commits into main from inter-rep

Contributor

Angelica-Schell commented Aug 8, 2024

This pull request includes intermediate representation functions to and from binary, hex, fixed, and float.

Angelica-Schell added 5 commits

August 6, 2024 06:42


          Intermediate Representation functions added and are now default. Will…

709d0fa

… only try fast track with inputs less than FAST_TRACK_THRESHOLD number of bits, currently 24.


          Addded some dependencies

51f4ee8


          Added spec for intermediate representation functions

ab02074


          formatting

c71eb92


          fixed dependencies

c317b63

bcarlet reviewed

View reviewed changes

Contributor

bcarlet left a comment

This looks like a great start @Angelica-Schell! I did leave a few notes where I think there are potential correctness problems. Accordingly, it would be great to get some testing set up for this. There's one fairly easy approach though: just add a flag that lets you force the use of the IR even for small numbers, and then do differential testing on your two implementations, using your existing test-case generator.

tools/data-conversion/src/main.rs Outdated

@@ @@ -128,44 +142,108 @@ fn convert( @@
                   match (convert_from, convert_to) {
                       (NumType::Hex, NumType::Binary) => {
                           for line in read_to_string(filepath_get).unwrap().lines() {
-                              hex_to_binary(line, &mut converted)
+                              if line.parse::<f32>().unwrap() <= FAST_TRACK_THRESHOLD as f32 {

Contributor

bcarlet Aug 8, 2024

I wonder if we should make the selection between fast-path and IR implementation based on the formats specified on the command line rather than the size of each individual value? That would require some error checking in the fast-path functions, but I'd imagine it would still be faster because it avoids doing the parsing twice. I suppose the downside is that you might miss out on the opportunity to use the fast path for values that are smaller than the format allows, so I'm not sure what the right tradeoff is.

tools/data-conversion/src/main.rs Outdated

@@ @@ -128,44 +142,108 @@ fn convert( @@
                   match (convert_from, convert_to) {
                       (NumType::Hex, NumType::Binary) => {
                           for line in read_to_string(filepath_get).unwrap().lines() {
-                              hex_to_binary(line, &mut converted)
+                              if line.parse::<f32>().unwrap() <= FAST_TRACK_THRESHOLD as f32 {

Contributor

bcarlet Aug 8, 2024

I think your value of FAST_TRACK_THRESHOLD is overly conservative in this instance. Looking at the implementation of hex_to_binary, I don't see any reason why the implementation wouldn't work for widths up to 32. In general, each of your fast-path implementations will probably have different requirements (and some may have requirements on the output format too, not just the input).

tools/data-conversion/src/main.rs Outdated

+              ///
+              /// This function will panic if the input string cannot be parsed as a binary number.
+              fn binary_to_intermediate(binary_string: &str) -> IntermediateRepresentation {

Contributor

bcarlet Aug 8, 2024

One minor organizational suggestion: perhaps these should be associated functions, with names like from_binary. (This is the rust convention for making "constructors".)

tools/data-conversion/src/main.rs Outdated

+              /// This function will panic if the input string cannot be parsed as a binary number.
+              fn binary_to_intermediate(binary_string: &str) -> IntermediateRepresentation {
+                  let bit_width = binary_string.len();

Contributor

bcarlet Aug 8, 2024

I can't remember if we've already discussed this, but I'm not sure it's the right choice to infer the width of the format from the values? (And thereby implicitly requiring that users pad all values to the correct length.) My intuition is that the width is part of the format specified on the command line, and we shouldn't require that users pad their values.

tools/data-conversion/src/main.rs Outdated

+              // Threshold for using fast-track functions
+              const FAST_TRACK_THRESHOLD: u32 = 1 << 24;
+              struct IntermediateRepresentation {

Contributor

bcarlet Aug 8, 2024

This struct should probably have some documentation. For instance, does sign == true indicate that the number is negative, non-negative, etc?

tools/data-conversion/src/main.rs Outdated

+                      -BigInt::from(inter_rep.mantissa)
+                  };
+                  let binary_str = inter_value.to_str_radix(2);

Contributor

bcarlet Aug 8, 2024

I think this will format negative numbers with a - sign instead of writing out their two's complement representation. The other "to_binary" functions appear to do the latter, so something seems inconsistent here.

tools/data-conversion/src/main.rs Outdated

+                  IntermediateRepresentation {
+                      sign,
+                      mantissa: BigUint::from_str(&mantissa_string).expect("Invalid number"),
+                      exponent: -(fractional_part.len() as i32),

Contributor

bcarlet Aug 8, 2024

This appears to be assuming that the intermediate representation uses base-10 "scientific notation," but I think we want base-2?

But taking a step back, in the past we've discussed defining decimal strings as representing double-precision floats, to avoid the problems that arise when you try to handle decimal floats in an arbitrary-precision way. (And we would require the use of hexfloats if you need human-readable arbitrary-precision floats.) So I guess now would be the time to decide whether we want to commit to that strategy. If so, then you don't really need to worry about the float parsing at all; you can just defer to rust's built-in parsing, and then extract the sign, mantissa, and exponent from the f64.

tools/data-conversion/src/main.rs Outdated

+                  let mut mantissa_str = inter_rep.mantissa.to_string();
+                  // Determine the position to insert the decimal point
+                  let mut decimal_pos = mantissa_str.len() as i32 - inter_rep.exponent;

Contributor

bcarlet Aug 8, 2024

As above, this seems to be assuming a base-10 intermediate representation.

tools/data-conversion/src/main.rs Outdated

+              fn hex_to_intermediate(hex_string: &str) -> IntermediateRepresentation {
+                  // Get sign value before converting string
+                  let sign = !hex_string.starts_with('-');

Contributor

bcarlet Aug 8, 2024

I'm not sure I understand why binary_to_intermediate assumes two's complement but hex_to_intermediate assumes that the sign is indicated with a - symbol. I guess I would expect these functions to be basically identical other than the radix used for parsing?

tools/data-conversion/src/main.rs Outdated

+              fn intermediate_to_fixed(
+                  inter_rep: IntermediateRepresentation,
+                  filepath_send: &mut Option<File>,
+              ) -> io::Result<()> {

Contributor

bcarlet Aug 8, 2024

I haven't looked through this implementation carefully yet, but I would expect a "to_fixed" function to take the exponent of the fixed-point format as an argument.

Contributor Author

Angelica-Schell Aug 10, 2024

I think for this I am just using the exponent from the intermediate representation, does that make sense?

Angelica-Schell added 10 commits

August 9, 2024 09:07


          Updated from 32 bit width to 64

a134703


          Updated thresholds and corrected error in using thresholds

cbd9d31


          updated intermediate rep functions into helper functions and implemen…

eac7ba3

…ted some other suggestions


          Corrected binary_to_intermediate by requiring bit width from the comm…

85f6e21

…and line.


          Updated from_hex to match the implementation of from_binary

b44ea8c


          corrected bit width command line input so that it is optional"

cb240e2


          Seperation of NumType and FileType

de786e5


          Added U8 Vector Representation

1c154e1


          Started modularizing

21df9c2


          worked on u8 vector -> ir, and edited hex and binary -> u8 vector fun…

15f3cb6

…ctions

Contributor

rachitnigam commented Oct 6, 2024

@bcarlet @Angelica-Schell what needs to be done to get this PR merged? If the goal is to not merge it, then I recommend closing it.

Angelica-Schell closed this

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet