Re: Replication identifiers, take 4

From: Petr Jelinek <petr(at)2ndquadrant(dot)com>
To: Andres Freund <andres(at)2ndquadrant(dot)com>, Petr Jelinek <petr(at)2ndquadrant(dot)com>
Cc: Heikki Linnakangas <hlinnakangas(at)vmware(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Steve Singer <steve(at)ssinger(dot)info>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Replication identifiers, take 4
Date: 2015-02-22 13:04:18
Message-ID: 54E9D3D2.4080103@2ndquadrant.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 22/02/15 09:57, Andres Freund wrote:
> On 2015-02-19 00:49:50 +0100, Petr Jelinek wrote:
>> On 16/02/15 10:46, Andres Freund wrote:
>>> On 2015-02-16 11:34:10 +0200, Heikki Linnakangas wrote:
>>>
>>>> At a quick glance, this basic design seems workable. I would suggest
>>>> expanding the replication IDs to regular 4 byte oids. Two extra bytes is a
>>>> small price to pay, to make it work more like everything else in the system.
>>>
>>> I don't know. Growing from 3 to 5 byte overhead per relevant record (or
>>> even 0 to 5 in case the padding is reused) is rather noticeable. If we
>>> later find it to be a limit (I seriously doubt that), we can still
>>> increase it in a major release without anybody really noticing.
>>>
>>
>> I agree that limiting the overhead is important.
>>
>> But I have related though about this - I wonder if it's worth to try to map
>> this more directly to the CommitTsNodeId.
>
> Maybe. I'd rather go the other way round though;
> replication_identifier.c/h's stuff seems much more generic than
> CommitTsNodeId.
>

Probably not more generic but definitely nicer as the nodes are named
and yes it has richer API.

>> I mean it is currently using that
>> for the actual storage, but I am thinking of having the Replication
>> Identifiers be the public API and the CommitTsNodeId stuff be just hidden
>> implementation detail. This thought is based on the fact that CommitTsNodeId
>> provides somewhat meaningless number while Replication Identifiers give us
>> nice name for that number. And if we do that then the type should perhaps be
>> same for both?
>
> I'm not sure. Given that this is included in a significant number of
> recordsd I'd really rather not increase the overhead as described
> above. Maybe we can just limit CommitTsNodeId to the same size for now?
>

That would make sense.

I also wonder about the default nodeid infrastructure the committs
provides. I can't think of use-case which it can be used for and which
isn't solved by replication identifiers - in the end the main reason I
added that infra was to make it possible to write something like
replication identifiers as part of an extension. In any case I don't
think the default nodeid can be used in parallel with replication
identifiers since one will overwrite the SLRU record for the other.
Maybe it's enough if this is documented but I think it might be better
if this patch removed that default committs nodeid infrastructure. It's
just few lines of code which nobody should be using yet.

Thinking about this some more and rereading the code I see that you are
setting TransactionTreeSetCommitTsData during xlog replay, but not
during transaction commit, that does not seem correct as the local
records will not have any nodeid/origin.

--
Petr Jelinek http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Andres Freund 2015-02-22 13:05:27 Re: SSL renegotiation
Previous Message Andres Freund 2015-02-22 12:22:11 Re: hash agg is slower on wide tables?