Re: [PATCH] Logical decoding support for sequence advances

From: Craig Ringer <craig(at)2ndquadrant(dot)com>
To: Andres Freund <andres(at)anarazel(dot)de>
Cc: PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>, Petr Jelinek <petr(dot)jelinek(at)2ndquadrant(dot)com>, konstantin knizhnik <k(dot)knizhnik(at)postgrespro(dot)ru>
Subject: Re: [PATCH] Logical decoding support for sequence advances
Date: 2016-02-29 02:23:10
Message-ID: CAMsr+YE+zR1hvhiG8hdNTShBJ51O0_5vR8teia45Bh_NmgB3Hw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 17 December 2015 at 10:08, Craig Ringer <craig(at)2ndquadrant(dot)com> wrote:

> On 15 December 2015 at 20:17, Andres Freund <andres(at)anarazel(dot)de> wrote:
>>
>>
>> I think this is quite the wrong approach. You're calling the logical
>> decoding callback directly from decode.c, circumventing
>> reorderbuffer.c. While you don't need the actual reordering, you *do*
>> need the snapshot integration.
>
>
> Yeah. I thought I could get away without it because I could use
> Form_pg_sequence.sequence_name to bypass any catalog lookups at all, but
> per upthread that field should be ignored and can't be used.
>
>
>> As you since noticed you can't just
>> look into the sequence tuple and be happy with that, you need to do
>> pg_class lookups et al. Furhtermore callbacks can validly expect to have
>> a snapshot set.
>>
>
> Good point. Just because I don't need one doesn't mean others won't, and
> all the other callbacks do.
>
> I'll have a read over reorderbuffer.c and see if I can integrate this
> properly.
>
>
>> At the very least what you need to do is to SetupHistoricSnapshot(),
>> then lookup the actual identity of the sequence using
>> RelidByRelfilenode() and only then you can call the callback.
>>
>
> Yeah. That means it's safe for the callback to do a syscache lookup for
> the sequence name by oid.
>

I've revisited this patch in an attempt to get a corrected version ready
for the next CF.

Turns out it's not that simple.

Sequences advances aren't part of a transaction (though they always happen
during one) and proceed whether the xact that happened to trigger them
commits or rolls back. So it doesn't make sense to add them to a reorder
buffer for an xact. We want to send it immediately, not associate it with
some arbitrary xact that just happened to use the last value in the cache
that might not commit for ages.

But then when do we send it? If sent at the moment of decoding the advance
from WAL then it'll always be sent to the client between the commit of one
xact and the begin of another, because it'll be sent while we're reading
xlog and populating reorder buffers. For advance of an already-created
sequence advance that's what we want, but CREATE SEQUENCE, ALTER SEQUENCE
etc also log RM_SEQ_ID XLOG_SEQ_LOG operations. The xact that created the
sequence isn't committed yet so sending the advance to the downstream will
lead to attempting to advance a sequence that doesn't yet exist. Similarly
ALTER SEQUENCE emits a new XLOG_SEQ_LOG with the updated Form_pg_sequence.
All fine for physical replication but rather more challenging for logical
replication since sometimes we want the sequence change to be associated
with a particular xact and sometimes not.

So the reorder buffer has to keep track of sequences. It must check to see
if a catalog change is a sequence creation and if so mark the sequence as
pending, keeping a copy of the Form_pg_sequence that's updated to the
latest version as the xact proceeds and writes updates to the sequence. On
commit the sequence advance is replayed at the end of the xact using the
snapshot of the newly committed xact; on rollback it's discarded since the
sequence never became visible to anyone else. We can safely assert that
that sequence will not be updated by any other xact until this one commits.

In reorderbuffer.c, there's a test for relation->rd_rel->relkind ==
RELKIND_SEQUENCE that skips changes to sequence relations. We'll want to
replicate sequence catalog updates as DDL via event triggers and deparse so
that's OK, but I think we'll need to make a note of sequence creation here
to mark new sequences as uncommitted.

If we see a sequence change that isn't for an uncommitted newly-created
sequence we make an immediate call through the reorder buffer to send the
sequence update info to the client. That's OK even if it's something like
ALTER SEQUENCE ... INCREMENT 10 RENAME TO othername; . The ALTER ... RENAME
part gets sent with the xact that did it when/if it commits since it's part
of the pg_class tuple for the sequence; the INCREMENT gets sent immediately
since it's part of the Form_pg_sequence. That matches what happens locally,
and it means there's no need to keep track of every sequence, only new ones.

On commit or rollback of an xact that creates a sequence we pop the
sequence oid from the ordered list of uncommitted sequence oids that must
be kept by the decoding session.

If we land up decoding the same WAL again we might send sequence updates
that temporarily move a sequence backwards then advance it again when we
replay the newer updates. That's the same as hot standby and seems fine.
You obviously won't be able to safely get new values from sequences that're
replicated from an upstream on the downstream anyway - and I anticipate
that logical decoding receivers will probably want to use seqam etc later
to make those sequences read-only until a failover event.

Sound reasonable?

* keep track of the last-committed xact's snapshot in the decoding session

* decode relation create/drop for RELKIND_SEQUENCE and add/remove entries
to a list in the decoding session. It shouldn't get big enough to need a
hash map (?). Each entry points to the reorder buffer associated with the
xact that created the sequence.

* when decoding a XLOG_SEQ_LOG, check if the sequence is uncommitted in the
decoding state. If it is, replace the prior copy of the sequence state in
the reorder buffer for the xact that created the sequence with this one.
Otherwise invoke the sequence callback immediately.

* In the sequence decoding callback look up the last committed xact's
snapshot and establish that as the historical snapshot, then get the
sequence name from the syscache and invoke the client's callback.

* On commit of an xact whose reorder buffer has one or more newly created
sequences, invoke the sequence decode callback for each sequence last, just
before sending the commit.

I'm not sure if I'll get to this for 9.6 given that it's a whole bunch more
than "just decode the sequence advance when you see it in the xlogs". We'll
see.

--
Craig Ringer http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Vitaly Burovoy 2016-02-29 02:38:06 Re: [PATH] Correct negative/zero year in to_date/to_timestamp
Previous Message Joe Conway 2016-02-29 01:40:06 Re: Proposal: SET ROLE hook