Re: Replication identifiers, take 4

From: Andres Freund <andres(at)anarazel(dot)de>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Andres Freund <andres(at)2ndquadrant(dot)com>, Heikki Linnakangas <hlinnakangas(at)vmware(dot)com>, Steve Singer <steve(at)ssinger(dot)info>, Petr Jelinek <petr(at)2ndquadrant(dot)com>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Replication identifiers, take 4
Date: 2015-04-07 15:08:16
Message-ID: 20150407150816.GF12291@awork2.anarazel.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 2015-03-24 23:11:26 -0400, Robert Haas wrote:
> On Mon, Feb 16, 2015 at 4:46 AM, Andres Freund <andres(at)2ndquadrant(dot)com> wrote:
> >> At a quick glance, this basic design seems workable. I would suggest
> >> expanding the replication IDs to regular 4 byte oids. Two extra bytes is a
> >> small price to pay, to make it work more like everything else in the system.
> >
> > I don't know. Growing from 3 to 5 byte overhead per relevant record (or
> > even 0 to 5 in case the padding is reused) is rather noticeable. If we
> > later find it to be a limit (I seriously doubt that), we can still
> > increase it in a major release without anybody really noticing.
>
> You might notice that Heikki is making the same point here that I've
> attempted to make multiple times in the past: limiting to replication
> identifier to 2 bytes because that's how much padding space you happen
> to have available is optimizing for the wrong thing. What we should
> be optimizing for is consistency and uniformity of design. System
> catalogs have OIDs, so this one should, too. You're not going to be
> able to paper over the fact that the column has some funky data type
> that is unlike every other column in the system.
>
> To the best of my knowledge, the statement that there is a noticeable
> performance cost for those 2 extra bytes is also completely
> unsupported by any actual benchmarking.

I'm starting benchmarks now.

But I have to say: I find the idea that you'd need more than 2^16
identifiers anytime soon not very credible. The likelihood that
replication identifiers are the limiting factor towards that seems
incredibly small. Just consider how you'd apply changes from so many
remotes; how to stream changes to them; how to even configure such a
complex setup. We can easily change the size limits in the next major
release without anybody being inconvenienced.

We've gone through quite some lengths reducing the overhead of WAL. I
don't understand why it's important that we do not make compromises
here; but why that doesn't matter elsewhere.

Greetings,

Andres Freund

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Stephen Frost 2015-04-07 15:21:49 Re: Row security violation error is misleading
Previous Message Andres Freund 2015-04-07 14:37:05 Re: Replication identifiers, take 4