Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Inter rep #2256

Open
wants to merge 14 commits into
base: main
Choose a base branch
from
Open

Inter rep #2256

wants to merge 14 commits into from

Conversation

Angelica-Schell
Copy link
Contributor

This pull request includes intermediate representation functions to and from binary, hex, fixed, and float.

Copy link
Contributor

@bcarlet bcarlet left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks like a great start @Angelica-Schell! I did leave a few notes where I think there are potential correctness problems. Accordingly, it would be great to get some testing set up for this. There's one fairly easy approach though: just add a flag that lets you force the use of the IR even for small numbers, and then do differential testing on your two implementations, using your existing test-case generator.

@@ -128,44 +142,108 @@ fn convert(
match (convert_from, convert_to) {
(NumType::Hex, NumType::Binary) => {
for line in read_to_string(filepath_get).unwrap().lines() {
hex_to_binary(line, &mut converted)
if line.parse::<f32>().unwrap() <= FAST_TRACK_THRESHOLD as f32 {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if we should make the selection between fast-path and IR implementation based on the formats specified on the command line rather than the size of each individual value? That would require some error checking in the fast-path functions, but I'd imagine it would still be faster because it avoids doing the parsing twice. I suppose the downside is that you might miss out on the opportunity to use the fast path for values that are smaller than the format allows, so I'm not sure what the right tradeoff is.

@@ -128,44 +142,108 @@ fn convert(
match (convert_from, convert_to) {
(NumType::Hex, NumType::Binary) => {
for line in read_to_string(filepath_get).unwrap().lines() {
hex_to_binary(line, &mut converted)
if line.parse::<f32>().unwrap() <= FAST_TRACK_THRESHOLD as f32 {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think your value of FAST_TRACK_THRESHOLD is overly conservative in this instance. Looking at the implementation of hex_to_binary, I don't see any reason why the implementation wouldn't work for widths up to 32. In general, each of your fast-path implementations will probably have different requirements (and some may have requirements on the output format too, not just the input).

///
/// This function will panic if the input string cannot be parsed as a binary number.

fn binary_to_intermediate(binary_string: &str) -> IntermediateRepresentation {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One minor organizational suggestion: perhaps these should be associated functions, with names like from_binary. (This is the rust convention for making "constructors".)

/// This function will panic if the input string cannot be parsed as a binary number.

fn binary_to_intermediate(binary_string: &str) -> IntermediateRepresentation {
let bit_width = binary_string.len();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can't remember if we've already discussed this, but I'm not sure it's the right choice to infer the width of the format from the values? (And thereby implicitly requiring that users pad all values to the correct length.) My intuition is that the width is part of the format specified on the command line, and we shouldn't require that users pad their values.

// Threshold for using fast-track functions
const FAST_TRACK_THRESHOLD: u32 = 1 << 24;

struct IntermediateRepresentation {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This struct should probably have some documentation. For instance, does sign == true indicate that the number is negative, non-negative, etc?

-BigInt::from(inter_rep.mantissa)
};

let binary_str = inter_value.to_str_radix(2);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this will format negative numbers with a - sign instead of writing out their two's complement representation. The other "to_binary" functions appear to do the latter, so something seems inconsistent here.

IntermediateRepresentation {
sign,
mantissa: BigUint::from_str(&mantissa_string).expect("Invalid number"),
exponent: -(fractional_part.len() as i32),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This appears to be assuming that the intermediate representation uses base-10 "scientific notation," but I think we want base-2?

But taking a step back, in the past we've discussed defining decimal strings as representing double-precision floats, to avoid the problems that arise when you try to handle decimal floats in an arbitrary-precision way. (And we would require the use of hexfloats if you need human-readable arbitrary-precision floats.) So I guess now would be the time to decide whether we want to commit to that strategy. If so, then you don't really need to worry about the float parsing at all; you can just defer to rust's built-in parsing, and then extract the sign, mantissa, and exponent from the f64.

let mut mantissa_str = inter_rep.mantissa.to_string();

// Determine the position to insert the decimal point
let mut decimal_pos = mantissa_str.len() as i32 - inter_rep.exponent;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As above, this seems to be assuming a base-10 intermediate representation.


fn hex_to_intermediate(hex_string: &str) -> IntermediateRepresentation {
// Get sign value before converting string
let sign = !hex_string.starts_with('-');
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure I understand why binary_to_intermediate assumes two's complement but hex_to_intermediate assumes that the sign is indicated with a - symbol. I guess I would expect these functions to be basically identical other than the radix used for parsing?

fn intermediate_to_fixed(
inter_rep: IntermediateRepresentation,
filepath_send: &mut Option<File>,
) -> io::Result<()> {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I haven't looked through this implementation carefully yet, but I would expect a "to_fixed" function to take the exponent of the fixed-point format as an argument.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think for this I am just using the exponent from the intermediate representation, does that make sense?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants