Skip to content

feat: make Gtfs and RawGtfs structs clonable#181

Open
HelloJowet wants to merge 2 commits intorust-transit:mainfrom
HelloJowet:feat/clonable_structs
Open

feat: make Gtfs and RawGtfs structs clonable#181
HelloJowet wants to merge 2 commits intorust-transit:mainfrom
HelloJowet:feat/clonable_structs

Conversation

@HelloJowet
Copy link

I wan't to be able to clone the Gtfs and RawGtfs structs.

I need this, because the validate_and_metadata function in the transport-validator repo takes ownership of a RawGtfs instance. By making these structs clonable, I can perform additional tasks in my program after validation without having to read the GTFS feed from the source twice.

@Tristramg
Copy link
Collaborator

Hello, thank you for the pull request. This would however require some thoughts.

Not making it clonable was a design choice: the GTFS data can be quite large, and cloning would immediately double the memory usage, and we put some efforts to keep it moderately controled.
That is also why the Gtfs::try_from(raw) consumes the raw gtfs to reduce re-allocations. Maybe a compromise would be to create Gtfs::try_from_clone(&raw) (with a better name) that would allow you to pass a reference to validate_and_metadata as a reference.

Maybe that function could also be renamed to store more the idea of a pipeline of treaments, it already does two unrelated things (metadata, validation) and could do the extra step you are planing?
Out of curiosity, what is the additional task you have in mind?

@HelloJowet
Copy link
Author

@Tristramg Thanks for your response.

Not making it clonable was a design choice: the GTFS data can be quite large, and cloning would immediately double the memory usage, and we put some efforts to keep it moderately controled.
That is also why the Gtfs::try_from(raw) consumes the raw gtfs to reduce re-allocations.

I definitely agree with this approach. If we can solve my use case without cloning the RawGtfs instance, that would be ideal to keep the memory footprint small.

Maybe a compromise would be to create Gtfs::try_from_clone(&raw) (with a better name) that would allow you to pass a reference to validate_and_metadata as a reference.

I’m not sure if a Gtfs::try_from_clone(&raw) function would ultimately solve the underlying issue. Even if we initialize from a reference, we’d still likely need to clone the internal properties (like calendar or stops) to give the new Gtfs struct ownership, which brings us back to the same memory overhead.

[...] that function [...] already does two unrelated things (metadata, validation) [...]

To be honest, I’m also not super happy with how validate_and_metadata is currently designed. You’re right - it’s doing two separate things.

I thought about implementing the validate_and_metadata function myself. This way I could continue using the Gtfs instance after running the validation without the need to clone anything. The problem is that I want to use the RawGtfs instance instead for the additional tasks, because I want to use properties like RawTrip and RawStopTime instead of Trip and StopTime.

Out of curiosity, what is the additional task you have in mind?

The GTFS reading and validation is just one phase of the project I'm working on. I'm almost finished with a converter from GTFS csv files into a SQLite database and I'm planning to create a GTFS to NeTEx converter too. I’m very much open to making this work open source once it's more mature.

For now, the current solution of making the RawGtfs struct clonable is working for me. It’s not perfect, but it covers my immediate needs. I completely understand if you're hesitant to merge it. If you have a preferred alternative approach or a specific vision for the refactor, let me know and I’d be happy to give it a try.

@Tristramg
Copy link
Collaborator

Ok, I’ll have a look if it’s easy to protect the clone with a feature flag (as it is a derived, right now I’m not sure how it would work). How urgent is it?

For information, there is this converter that was built for transport.data.gouv.fr to convert gtfs to netex: https://github.com/hove-io/transit_model/blob/master/gtfs2netexfr/README.md

@Tristramg
Copy link
Collaborator

#[derive(Debug)]
#[cfg_attr(feature = "clone", derive(Clone))]
pub struct RawGtfs {

Ok, that was easier than I thought. Would it be OK for you to hide the clone ability behind a feature flag to avoid cloning by accident?

@antoine-de do you have an opinion on this?

@antoine-de
Copy link
Collaborator

hum, maybe we could change the validator and make it return (optionally?) the built GTFS?

it seems it need the ownership to be able to do validations on a RawGTFS, convert it to a GTFS and do more validations.

Wouldn't it solve all your problems if it was possible to get back the whole GTFS at the end?

Added Deserialize trait to ObjectType enum for GTFS.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants