Scalable, Flexible Data Conversion - Summer 2024 Lab Notebook #2088
Replies: 3 comments 3 replies
-
Here's an overview of what I'm working on this summer! Last semester, I started working on a tool that supports fast and flexible conversions between various numerical formats. I made a tool that converts to and from binary, hexadecimal, floating point numbers, and fixed numbers. I will continue to work on this tool, with an emphasis on making it fast and tailoring it to the needs of the Capra community. Some of my next steps will be converting to and from fixed point by bit-slicing instead of the current way of using multiplication. I will also be creating a test suite and doing some basic benchmarking of the two different ways of converting to and from fixed. After that, I'm going to create an intermediate representation, accept greater than 64 bit numbers, and integrate fud 2. There's more but I think this is a good start! I will be updating each Thursday for now :) |
Beta Was this translation helpful? Give feedback.
-
This week's update is pretty short. Most of my time went towards fixing a GitHub issue that wouldn't allow me to merge. I did wind up learning a good amount about branches, but I think the main thing is to make sure I pull before trying to commit any changes. I was also able to get started on the conversion to and from fixed-point numbers using bit-slicing instead of multiplication, which should be pretty quick to finish. My next step is to create a test suite to benchmark the speed difference between the two ways of converting. |
Beta Was this translation helpful? Give feedback.
-
Apologies that it's been a while since I updated here; it's been a little crazy. But I finally finished the test suite for the binary to fixed functions! I initially started out with Runt in mind and made a Python script that creates n input files and n expect files. I also had to change my rust function to have the option to print to stdout to compare the output of my rust function to the output of the Python script. This wound up not working out that well, however, because Runt performed a direct string comparison of the outputs rather than evaluating the numerical values, and trying to get the Python output to look like rusts was probably more trouble than it was worth. Instead, I modified the Python script to capture the output of my Rust function and compare the actual numerical values. This approach worked well; I just had to adjust the script to generate inputs within the acceptable range of conversion since some numbers could not be represented properly if they were too large when converted to f32. That worked well to test accuracy, but we also want to test for speed, and specifically, the difference in speed between the version of the binary to fixed function that uses bit manipulation and the one that uses float arithmetic. So, I modified the Python script to generate a single file with n entries instead of n .in files. I then used hyperfine to measure the difference in speed between the two functions. The bit manipulation version wound up being ~ 1.12 ± 0.06 times faster than the float arithmetic version. I am now working on (and hopefully finishing up) creating functions for the 4 data types the function currently handles (binary, hex, float, fixed) to and from an intermediate representation. We agreed that the intermediate representation should look like the following:
The sign indicates whether the number is positive or negative, the mantissa represents the numerical part of the number, and the exponent determines the position of the decimal point. I have working versions of all eight functions and now need to test them more extensively. Since the mantissa is a big num, this also accomplishes accepting greater than 64-bit inputs. The last thing is that we added a few things to the summer to-do list, including the option for accepting binary input files and creating binary output files, which I will work on after completing the intermediate representation. |
Beta Was this translation helpful? Give feedback.
-
Lab notebook for questions/updates on the data conversion project of the summer of 2024
Beta Was this translation helpful? Give feedback.
All reactions