Re: [PATCH] Logical decoding support for sequence advances

From: Craig Ringer <craig(at)2ndquadrant(dot)com>
To: Petr Jelinek <petr(at)2ndquadrant(dot)com>
Cc: Andres Freund <andres(at)anarazel(dot)de>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>, Petr Jelinek <petr(dot)jelinek(at)2ndquadrant(dot)com>, konstantin knizhnik <k(dot)knizhnik(at)postgrespro(dot)ru>
Subject: Re: [PATCH] Logical decoding support for sequence advances
Date: 2016-03-15 05:18:59
Message-ID: CAMsr+YHF6-gUhzXYZLo3gEb8ceSP7_6zbm4vrWjwSCsiDEApPA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox
Thread:
Lists: pgsql-hackers

On 11 March 2016 at 22:24, Petr Jelinek <petr(at)2ndquadrant(dot)com> wrote:

> On 02/03/16 08:05, Craig Ringer wrote:
>
>> On 1 March 2016 at 05:30, Petr Jelinek <petr(at)2ndquadrant(dot)com
>> <mailto:petr(at)2ndquadrant(dot)com>> wrote:
>>
>

>
>>
>> I wonder if it would be acceptable to create new info flag for
>> RM_SEQ_ID that would behave just like XLOG_SEQ_LOG but would be used
>> only for the nontransactional updates (nextval) so that decoding
>> could easily differentiate between transactional and
>> non-transactional update of sequence and then just either call the
>> callback immediately or add the change to reorder buffer based on
>> that. The redo code could just have simple OR expression to behave
>> same with both of the info flags.
>>
>>
>> That's much cleaner than trying to keep track of sequence creations and
>> really pretty harmless. I'll give that a go and see how it looks.
>>
>> Seems like simpler solution than building all the tracking code on
>> the decoding side to me.
>>
>>
>> +1
>>
>>
> Except this won't work for sequences that have been created in same
> transaction as the nextval()/setval() was called because in those cases we
> don't want to decode the advancement of sequence until the end of
> transaction and we can't map the relfilenode to sequence without going
> through reorder buffer in those cases either
>

I'll explain this a bit (for when I forget all about it and come back to it
confused, or if someone else picks this up):

The issue is transactions like

BEGIN;
CREATE TABLE blah (id serial primary key, something text);
INSERT INTO blah (something) SELECT .... ;
COMMIT;

Here we create the sequence, then we advance the sequence in subsequent
statements that're part of the same xact but not directly connected to the
sequence creation. There's no convenient way to tell, when we see the
Form_pg_sequence updates in WAL for the newly created sequence, that it's
for a not-yet-committed xact so we shouldn't send the advance to the client
yet.

Once the xact that created the sequence commits we have to make sure we
send its latest state, not the initial state when it was created. So with
the above proposal we'd still need to look up those new-info-flagged
entries against a map of uncommitted sequences by relfilenode and decide
whether to send it immediately or update the latest state of an uncommitted
sequence in a reorder buffer.

IOW we have to do pretty much what I described before. We can still log
sequence updates with a different info flag but we need to know how to
associate the record with the xact that created it, so we have to log the
creating xid in the record for the initial state of a newly created
sequence. At least that'd be less ugly than trying to peek at decoded
catalog updates in the reorder buffer to spot new sequence creation and can
be done only when wal_level = logical, but it'd mean that the two record
types were different in more than just info flag.

The other wrinkle Petr refers to is that when decoding XLOG_SEQ_LOG we only
have a relfilenode. We don't know the oid of the sequence, which we need to
look up its name. The reorder buffer code uses RelidByRelfilenode for that,
which requires a snapshot. I'm not sure what problem that poses, since we'd
obviously need a snapshot set up to look up the name by oid anyway and we'd
be using the most recently committed historic snapshot for both.

Anyway, this is still complicated because of the mess with sequences being
both transactional and not-transactional in ways that rely on how the low
level storage and WAL works.

Unfortunately I don't expect to have time to produce a new patch for 9.6.

(BTW, I'd be interested in seeing what code breaks if we introduced a
compile option in src/include/pg_config_manual.h to force oid and
relfilenode randomization rather than starting off with them being the
same.)

--
Craig Ringer http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Ioseph Kim 2016-03-15 05:36:45 Re: propose: detail binding error log
Previous Message Amit Kapila 2016-03-15 05:17:12 Re: Speed up Clog Access by increasing CLOG buffers