Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

🛠️ Refactor suggestion: Fragile string-based parsing of byte sizes #1187

Closed
de-sh opened this issue Feb 13, 2025 · 6 comments
Closed

🛠️ Refactor suggestion: Fragile string-based parsing of byte sizes #1187

de-sh opened this issue Feb 13, 2025 · 6 comments
Assignees
Labels
enhancement New feature or request good first issue Good for newcomers

Comments

@de-sh
Copy link
Contributor

de-sh commented Feb 13, 2025

Splitting on " Bytes" and re-parsing is susceptible to format changes. A more robust approach is to store sizes as numeric fields within the QueriedStats, IngestionStats, and StorageStats, then only format them when returning or displaying. This prevents runtime errors if the format changes and simplifies logic.

 pub fn merge(stats: Vec<Self>) -> Self {
     // Instead of brandishing " Bytes" mid-calculation, define numeric fields in these structs:
     //   - e.g., `pub size_in_bytes: u64`
     // Summation is then just numeric addition.
     // Then, in display or response, format with "Bytes".
 }

Also applies to: 95-128, 130-136

Originally posted by @coderabbitai[bot] in #1150 (comment)

@de-sh de-sh changed the title _:hammer_and_wrench: Refactor suggestion_ **Fragile string-based parsing of byte sizes.** 🛠️ Refactor suggestion: Fragile string-based parsing of byte sizes Feb 13, 2025
@de-sh de-sh added enhancement New feature or request good first issue Good for newcomers labels Feb 13, 2025
@ArjunKrish7356
Copy link

hey @de-sh i would like to work on this. Could you assign this to me .

@ArjunKrish7356
Copy link

Hey a quick question . I checked up on the issue and i can approach this in 2 ways .

One i can directly convert the fields that accept the size (as string) to u64 so it directly stores the size as numerical value , but in this case i would have to find the locations where the method merge_quired_stats are called and modify the inputs as u64 from string .
size : String ==> replaced by ==> size_in_bytes: u64

The other approach is to let the Structs (IngestionStats , StorageStats ) accept the value of size as string but we will add 3 new fields for each 3 size ( size , lifetime_size , deleted_size ) and when calling the ( ::new ) method we will convert the string val to u64 and store in the 3 new fields , so while calcuation we can use the u64 fields.
size: String and size_in_bytes: u64 both fields will be in the struct

Which one would you prefer from above or is there another approach that i should look into ?

@de-sh
Copy link
Contributor Author

de-sh commented Mar 1, 2025

The right way to do this is to hold numbers in memory and deserialize/serialize from where necessary as string "x Bytes".

Refer previous work done here:
https://github.com/parseablehq/parseable/pull/1022/files#diff-c3ece837aee30565f47a2ccbebaa9fc947d032dfc1c14b8f70811a03203cf305R59-R64

@ArjunKrish7356
Copy link

ArjunKrish7356 commented Mar 5, 2025

Is the de-sh:query-param repository the correct one where I should focus my contributions?

Also according to my understanding, the structs will continue to accept values as they currently do. However, I will introduce functions to store these values as u64 in memory for efficient computation. Additionally, I will modify the serde serializer to ensure that when the struct is converted to JSON the values are formatted as strings in the "X Bytes" format. Is this correct ?

let stats = if let Some(mut ingestor_stats) = ingestor_stats {
        ingestor_stats.push(stats);
        QueriedStats::merge(ingestor_stats)
    } else {
        stats
    };

    Ok(HttpResponse::build(StatusCode::OK).json(stats))
}

@de-sh
Copy link
Contributor Author

de-sh commented Mar 5, 2025

Also according to my understanding, the structs will continue to accept values as they currently do. However, I will introduce functions to store these values as u64 in memory for efficient computation. Additionally, I will modify the serde serializer to ensure that when the struct is converted to JSON the values are formatted as strings in the "X Bytes" format. Is this correct ?

No, we need to serialize and deserialize into integers that will only remain as such in-memory. Serializing and deserialization should happen as "X Bytes"

Is the de-sh:query-param repository the correct one where I should focus my contributions?

You can make the changes against main branch here

@ArjunKrish7356
Copy link

It looks like someone else addressed and resolved this issue before I could push my changes. I will look into other issues.

@de-sh de-sh mentioned this issue Mar 6, 2025
3 tasks
@de-sh de-sh closed this as completed Mar 6, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request good first issue Good for newcomers
Projects
None yet
Development

No branches or pull requests

2 participants