Re: [PATCH] Use indexes on the subscriber when REPLICA IDENTITY is full on the publisher

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Önder Kalacı <onderkalaci(at)gmail(dot)com>
Cc: Andres Freund <andres(at)anarazel(dot)de>, "shiy(dot)fnst(at)fujitsu(dot)com" <shiy(dot)fnst(at)fujitsu(dot)com>, "wangw(dot)fnst(at)fujitsu(dot)com" <wangw(dot)fnst(at)fujitsu(dot)com>, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: [PATCH] Use indexes on the subscriber when REPLICA IDENTITY is full on the publisher
Date: 2023-01-07 18:50:04
Message-ID: 3466340.1673117404@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

=?UTF-8?B?w5ZuZGVyIEthbGFjxLE=?= <onderkalaci(at)gmail(dot)com> writes:
> Attached v22.

I took a very brief look through this. I'm not too pleased with
this whole line of development TBH. It seems to me that the core
design of execReplication.c and related code is "let's build our
own half-baked executor and much-less-than-half-baked planner,
because XXX". (I'm not too sure what XXX was, really, but apparently
somebody managed to convince people that that is a sane and
maintainable design.) Now this patch has decided that it *will*
use the real planner, or at least portions of it in some cases.
If we're going to do that ISTM we ought to replace all the existing
not-really-a-planner logic, but this has not done that; instead
we have a large net addition to the already very duplicative
replication code, with weird restrictions because it doesn't want
to make changes to the half-baked executor.

I think we should either live within the constraints set by this
overarching design, or else nuke execReplication.c from orbit and
start using the real planner and executor. Perhaps the foreign
key enforcement mechanisms could be a model --- although if you
don't want to buy into using SPI as well, you probably should look
at Amit L's work at [1].

Also ... maybe I am missing something, but is REPLICA IDENTITY FULL
sanely defined in the first place? It looks to me that
RelationFindReplTupleSeq assumes without proof that there is a unique
full-tuple match, but that is only reasonable to assume if there is at
least one unique index (and maybe not even then, if nulls are involved).
If there is a unique index, why can't that be chosen as replica identity?
If there isn't, what semantics are we actually providing?

What I'm thinking about is that maybe REPLICA IDENTITY FULL should be
defined as "the subscriber can pick any unique index to match on,
and is allowed to fail if the table has none". Or if "fail" is a bridge
too far for you, we could fall back to the existing seqscan logic.
But thumbing through the existing indexes to find a non-expression unique
index wouldn't require invoking the full planner. Any candidate index
would result in a plan estimated to fetch just one row, so there aren't
likely to be serious cost differences.

regards, tom lane

[1] https://www.postgresql.org/message-id/flat/CA+HiwqG5e8pk8s7+7zhr1Nc_PGyhEdM5f=pHkMOdK1RYWXfJsg(at)mail(dot)gmail(dot)com

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Joseph Koshakow 2023-01-07 20:04:25 Re: Infinite Interval
Previous Message Pavel Stehule 2023-01-07 18:09:11 Re: [RFC] Add jit deform_counter