Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: Update CorrelationId regex to support UUIDv7 #131

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

PaulGregoryBaker
Copy link

This change will relax the regex validation on the CorrelationId to support UUIDv7 Unique Identifier formats. This change is required to enable performance on the current mojaloop implentation in MySQL.

Current performance tests show that there is a 70% drop off in performance in the soak test when using a UUICv4. This is the default Unique Identifier. UUIDv7 does not have this drop off in performance.

New Identifier for ISO20022: There is an existing search for a new unique identifier that is shorter in length that is required for the ISO20022 API implementation. As the identifier has not yet been selected, and that this issue is currently blocking efforts in the performance workstream, the proposal is to make this adjustment as an interum measure.

@kalinkrustev
Copy link

There are also other places where this regex pattern can be seen https://github.com/search?q=repo%3Amojaloop%2Fmojaloop-specification%20%5B1-5%5D%5B0-9a-f%5D%7B3%7D&type=code

Also the build fails, which needs some investigation too.

@henrka
Copy link
Contributor

henrka commented Jun 3, 2024

There are also other places where this regex pattern can be seen https://github.com/search?q=repo%3Amojaloop%2Fmojaloop-specification%20%5B1-5%5D%5B0-9a-f%5D%7B3%7D&type=code

Also the build fails, which needs some investigation too.

To avoid breaking changes for existing implementations of FSP Interoperability API, the pattern cannot be changed in version 1.x which this search also includes. It should only be changed for version 2.0.

Additionally, as far as I know, the actual pattern to use in 2.0 is still under discussion, so I don't know why this change should be done now..

@PaulGregoryBaker
Copy link
Author

@henrka, the current regex validation is currently blocking the performance workstream from being able to deliver their work in a standard way.

@henrka
Copy link
Contributor

henrka commented Jun 3, 2024

And why can't they change that locally? I don't want to change this for everyone just because some work stream has some current issue..

@elnyry-sam-k
Copy link
Member

hi @PaulGregoryBaker and @henrka , my suggestion was to consider this change for v2.0, which will be a breaking change for FX features anyway.

And, the PR was raised against the file used for the v2.0 draft as well, so that seems to be alignment. Performance or not, we agree that this cannot happen in 1.x versions, hence will go with v2.0

@henrka
Copy link
Contributor

henrka commented Jun 3, 2024

It seems a bit "early" for me to have this in 2.0 now as we haven't decided on which format to use yet. I know that 2.0 is still work in progress, but to me it seems wrong to change this now based on output from a performance work stream without any more analysis, as well as parallel work to investigate the format in #120.

@PaulGregoryBaker
Copy link
Author

In my opinion, the CCB as an important body in an open source project, should not be holding up other community contributions unnecessarily. Waiting for a decision on #120 would be doing just that. (Unless that can be resolved in the next few days.)

@henrka
Copy link
Contributor

henrka commented Jun 4, 2024

In what way is the CCB holding up community contributions? Each change needs to be properly analyzed and approved. The input from the work stream is highly valued, as is any other contribution. How would it help to have a temporary solution in 2.0 that hasn't been properly analyzed by anyone except the work stream? We are at the same time analyzing other solutions in #120, which could mean that another format should be used in 2.0, so I don't fully understand how changing this temporarily just to satisfy the performance work stream contribution for a potentially short time would help. In an API, I value stability more than potentially temporary changes, even if it is for a future version.

@kalinkrustev
Copy link

#120 did not take into account the performance implications of CUID2.

The readme for CUID2 contains a Note on K-Sortable/Sequential/Monotonically Increasing Ids, which recommends the use of createdAt fields. The problem is that this is not possible to do for Mojaloop without quite substantial effort, as in many cases the generated ids are primary keys and any non-sequential id generators lead to quite bad performance when data is accumulated. Avoiding this issue when non-sequential ids are generated will require a lot of effort in restructuring the database and probably reworking the table lookups to use a time range, as the primary key must be changed. The changes might even affect some of the logic of the flows. The issue is probably related to not just the SQL database, but also other places where we are likely to store and index the data, like log aggregators, etc.

The main claim of CUID2 vs sequential ids is the leak of timestamps, but this is just a generic claim and we should consider that many of the ids we generate are only significant for a short period of time, given the real-time nature of the functionality they are associated with. So this "leak" is not really an issue in our case. This leak is only significant when associated with entities that are not so much real-time related, like account creation, customer creation, etc. Instead of worrying for a leak, maybe better think about improving the logic and restrict any operations that relate to IDs outside of a certain timeframe.

The section also recommends the use of cloud solutions and in-memory databases, which is not the inclusivity we are working for and their use is often restricted by regulation. I think the allowed id generators should not be so restrictive, as for example the most important thing we want to restrict is the length, and even this can be probably parametrized during DB creation. Accepting only CUID2 will feel like a win for the cloud providers, not for inclusivity.

@henrka
Copy link
Contributor

henrka commented Jun 4, 2024

@kalinkrustev There has been a proposal to use CUID2 in #120, but there is no resolution, so to say it did not take into account the performance implications of CUID2 is not really correct (it can't be as there is no final solution yet). Performance has been and is discussed on the SIG meetings, which is why are looking into alternatives of CUID2.

Please correct me if I'm wrong, but I don't see any input regarding potential performance implications in #120. You are very welcome to add more background in that issue regarding the performance implications.

I fully understand that this has been analyzed from a performance perspective on Mojaloop, but we can't just change formats without any further analysis, as the API has implications on other systems as well.

@kalinkrustev
Copy link

@henrka , yes you are right that it is a proposal. I did not express correctly my thoughts, which were meant to say that performance was not considered so far in the comments there.

I was wondering where to put my comments, so maybe I need to move them to #120, just I wanted for now to keep the discussion more focused. The thing is that during performance tests, I noticed that UUID regex was only checked in the quoting service, while transfers were able to execute with UUID v7. So whatever we do to fix this situation, there will be a change. Isn't it better to change in the direction of allowing in quotes what was already allowed in transfers and update the docs? Otherwise if we restrict the transfers as per what the docs say, we can break something, that was allowed so far. Are you aware of particular implications on other systems? The change seems to be very small. Allowing more id formats seems to be opening Mojaloop to easier integrations with other systems.

The RFC 4122 for UUID v4 is already obsoleted by the RFC 9562, which includes v7.

@henrka
Copy link
Contributor

henrka commented Jun 4, 2024

@kalinkrustev If I'm not mistaken, it seems like you are referring to the Mojaloop switch implementation when you say "quoting service"? I don't know much about that implementation, I just know of the implementation in one FSP platform (and the API).

Unfortunately, just because a change seems small doesn't mean that the implications in surrounding systems is small.. Additionally, we would like to keep the format shorter than 36 characters for it to be able to fit in ISO 20022 (see #120), so UUID 7 would probably not be the best way forward.

@PaulGregoryBaker, you had some other suggested ID standard that you mentioned in the last SIG which should be good for performance as well as keeping the ID shorter than 36 characters? I can't seem to remember the acronym currently..

But we should move this discussion to #120 so that it is easier to follow for everyone else.

@kalinkrustev
Copy link

OK, let's move the discussion there.

I think @PaulGregoryBaker might have mentioned https://github.com/ulid/spec, which is generated in a way similar to UUID v7, but serialized in a better way to take less characters. Just this will be a lot bigger change with more implications.

@henrka
Copy link
Contributor

henrka commented Jun 4, 2024

Thanks @kalinkrustev, yes that was the one. Changes will be bigger, but as 2.0 is a new major version that should be fine from a version change perspective at least.

@elnyry-sam-k
Copy link
Member

@henrka and @MichaelJBRichards

For the question we had: "Is it ok to use this ULID v7: https://www.ietf.org/archive/id/draft-peabody-dispatch-new-uuid-format-01.html#name-uuidv7-layout-and-bit-order ? The format itself is "Simplified BSD License" and the GitHub repo for the specification says that it is GPL-3.0: https://github.com/ulid/spec, but the libraries we intend to use are of MIT (here: https://github.com/ulid/javascript ) which is compatible with our license Apache 2.0, hence the concern."

--
We have an OK on this now from the legal expert.

We're ok to go ahead and use the libraries (as long we don't modify them).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants