Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Transactions info (time) in unit events ? #19

Open
cyrilchapon opened this issue Dec 13, 2022 · 2 comments
Open

Transactions info (time) in unit events ? #19

cyrilchapon opened this issue Dec 13, 2022 · 2 comments

Comments

@cyrilchapon
Copy link

Hi,

I'm implementing a CDC use-case and found your library which is pretty cool (thanks for this 🙂).

Reading #17 + some Debezium posts saying they deprecated wal2json usage for pgoutput; I came to the conclusion I'd better go with the later.

I have tested both though, and one thing I liked with wal2json (v2) is the fact we have access to the commit-time of the transaction directly inside unit event (insert, update, ...)

Which is apparently not the case with pgoutput.

So my question is simple :

Is the fact a pgoutput DDL event (insert, update, delete) with PgOutPutPlugin due to

  • a lack of this piece of information directly in WAL
  • a lack while decoding the WAL in PgOutputPlugin
    ?

If the later, I'd try to PR this; but could you please give me some clues where and how to do so ?

Thanks 🙂

@kibae
Copy link
Owner

kibae commented Dec 14, 2022

Hello, @cyrilchapon 😃

Obviously wal2json has the advantage of handling transactions as a block.
However, wal2json is very slow or doesn't work well when a large number of records change. In this case, pgoutput is very useful.

When using the pgoutput plugin, the start time of the transaction(commitTime) is delivered.

private msgBegin(reader: BinaryReader): MessageBegin {
// TODO lsn can be null if origin sended
// https://github.com/postgres/postgres/blob/85c61ba8920ba73500e1518c63795982ee455d14/src/backend/replication/pgoutput/pgoutput.c#L409
// https://github.com/postgres/postgres/blob/27b77ecf9f4d5be211900eda54d8155ada50d696/src/include/replication/reorderbuffer.h#L275
return {
tag: 'begin',
commitLsn: reader.readLsn(),
commitTime: reader.readTime(),
xid: reader.readInt32(),
};
}

I think it could be used while holding this time value as a member variable until commit.

@mastermatt How about this idea?

@mastermatt
Copy link
Collaborator

This pushes past my knowledge, notably around the fact that I've only ever needed to work with features available with protocol version 1.
The issue I see off the bat, is how to do you correlate other message events to the correct BEGIN event.

@cyrilchapon you said DDL events, but I just to make sure we're on the same page, pgoutput does not include DDL events you'll get DML (insert, update, delete, etc.).

It seems the BEGIN event has a transaction ID, then subsequent DML events can reference that ID.
From the docs:

Xid of the transaction (only present for streamed transactions). This field is available since protocol version 2.

As I've stated, I've never worked with streamed transactions in this context and I don't know the gotchas.
I also don't understand how the BEGIN event can have a commit timestamp if the transaction is being streamed.
One fear I'd have with having this lib owning the cache of the xid is knowing when to clear the cache. If it's streaming transactions, how guaranteed is it that the commit event will eventually be processed? and if not, is that a memory leak?

tl;dr: I don't know. But I'm curious, if anyone figures it out.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants