From: | Daniele Varrazzo <daniele(dot)varrazzo(at)gmail(dot)com> |
---|---|
To: | PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Help required to debug pg_repack breaking logical replication |
Date: | 2017-10-07 18:37:27 |
Message-ID: | CA+mi_8YWReON2gVk9qoeJLRVzWTSSoLkKAX9DERL36-n4Y8rZg@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
Hello,
we have been reported, and I have experienced a couple of times,
pg_repack breaking logical replication.
- https://github.com/reorg/pg_repack/issues/135
- https://github.com/2ndQuadrant/pglogical/issues/113
In my experience, after the botched run, the replication slot was
"stuck", and any attempt of reading (including
pg_logical_slot_peek_changes()) blocked until ctrl-c. I've tried
replicating the issue but first attempts have failed to fail.
In the above issue #113, Petr Jelinek commented:
> From quick look at pg_repack, the way it does table rewrite is almost guaranteed
> to break logical decoding unless there is zero unconsumed changes for a given table
> as it does not build the necessary mappings info for logical decoding that standard
> heap rewrite in postgres does.
unfortunately he didn't follow up to further details requests.
I've started drilling down the problem, observing that the swap
function, swap_heap_or_index_files() [1] was cargoculted by the
original author from the CLUSTER command code as of PG 8.2 [2] (with a
custom addition to update the relfrozenxid which seems backwards to me
as it sets the older frozen xid on the new table [3]).
[1] https://github.com/reorg/pg_repack/blob/ver_1.4.1/lib/repack.c#L1082
[2] https://github.com/postgres/postgres/blob/REL8_2_STABLE/src/backend/commands/cluster.c#L783
[3] https://github.com/reorg/pg_repack/issues/152
so that code is effectively missing a good 10 years of development.
Before jumping into fast-forwarding it, I would like to ask for some
help, i.e.
- Is Petr diagnosis right and freezing of logical replication is to be
blamed to missing mapping?
- Can you suggest a test to reproduce the issue reliably?
- What are mapped relations anyway?
Thank you in advance for any help (either info about how to fix the
issue properly, or a patch by someone who happens to really understand
what we are talking about).
-- Daniele
From | Date | Subject | |
---|---|---|---|
Next Message | konstantin knizhnik | 2017-10-07 19:39:09 | Slow synchronous logical replication |
Previous Message | Tom Lane | 2017-10-07 17:59:20 | Re: Discussion on missing optimizations |