Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Exists orphan files in secondary server after the repack. #362

Open
luannnh opened this issue Jun 30, 2023 · 5 comments
Open

Exists orphan files in secondary server after the repack. #362

luannnh opened this issue Jun 30, 2023 · 5 comments
Labels

Comments

@luannnh
Copy link

luannnh commented Jun 30, 2023

Environment

  • Primary Server: db00
  • Secondary Server: db01
  • Using Streaming Replication (physical slot): db00 --> db01
  • Postgres V15
  • OS: RHEL 7

Problem

  • Exists orphan files in Secondary Server (db01) after running repack on Primary Server (db00)

More details

  • We have a bloated table called tb_007 which is 100GB
  • Primary Server: running pg_repack on tb_007 . After the repack, as expected, on Primary Server, the PG_DATA disk is reclaimed.
  • With 100GB, it would take time to replicate data from Primary to Secondary.
  • After 5 hours, finally, the replication is in synced, i.e NO latency.
  • Secondary Server: the PG_DATA disk is not reclaimed (even bigger Primary Server)
  • We decide to wait for 3 days. However, after 3 days, the PG_DATA disk is still big in Secondary Server.

Investigation

  • The tb_007 OID is 17500
  • With 100GB, there are 100 physical files in PG_DATA
  • Before the repack, we are able to see 100 physical files named 17500.1, 17500.2 .... in both Primary and Secondary
  • After the repack successfully in Primary, these 17500.1, 17500.2 ... are removed in Primary as expected. Of course, at that time, these files are still exists in Secondary because of the latency.
  • After 3 days, we still see these 17500.1, 17500.2 ... in Secondary.

Note: similar to this #334

@luannnh
Copy link
Author

luannnh commented Jun 30, 2023

It's too soon to conclude anything.
I think it's possible when repack swaps files (which belong to tb_007), it happens only in Primary on the physical side which is not generate any WAL, so this action does not replicate to Secondary.

@za-arthur
Copy link
Collaborator

I think it's possible when repack swaps files (which belong to tb_007), it happens only in Primary on the physical side which is not generate any WAL, so this action does not replicate to Secondary.

Can you check this on your primary and a replica? You can use the query:

select relfilenode from pg_class where relname = 'tb_007' and relnamespace = 'public'::regnamespace;

The result should be same on both instances.

I just checked this in one of our instances and looks correct.

@luannnh
Copy link
Author

luannnh commented Jun 30, 2023

Hi @za-arthur

They are same number on both Primary and Secondary.
The old refilenode is 17500. After the repack, the new refilenode is 18500 on both Primary and Secondary.

The only different is

  • the old refilenode = 17500 was removed in Primary
  • but still exists in Secondary

@za-arthur za-arthur added the bug label Jun 30, 2023
@za-arthur
Copy link
Collaborator

Unfortunately I couldn't reproduce this in my local environment on Postgres 15 for now.
Also if I'm not mistaken we didn't have such issue in our production environment.

But since this isn't the first created issue it is worth to investigate further. Some more additional details if you find would help.

@luannnh
Copy link
Author

luannnh commented Jul 2, 2023

@za-arthur we tried to reproduce: streaming replication (physical slot) + PG V15 + pg_repack (latest version) on small table (< 100MB). The repack worked as expected.
However, when tried with 70gb table, we got the same issue "exists orphan files in secondary server".

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants