Re: Changeset Extraction v7.1

From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Andres Freund <andres(at)2ndquadrant(dot)com>
Cc: "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Changeset Extraction v7.1
Date: 2014-01-24 17:10:50
Message-ID: CA+Tgmoa+sZXGt32O3Dp5u+8f6iXYzk7teJp2rNcFeCVxd5uJHA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Thu, Jan 23, 2014 at 6:32 PM, Andres Freund <andres(at)2ndquadrant(dot)com> wrote:
>> I also wonder if we should use the
>> terminology "attach" instead of "acquire"; that pairs more naturally
>> with "release". Then the message, if we want more than an assert,
>> might be "this backend is already attached to a replication slot".
>
> I went with Acquire/Release because our locking code does so, and it
> seemed sensible to be consistent. I don't have strong feelings about it.

Yeah, but I think a slot is not really the same thing as a lock.
Acquire/release might be OK. In some of my recent code I used
attach/detach, which feels a little more natural to me for something
like this, so I lean that direction.

> Unfortunately not. Inside the walsender there's currently no
> LWLockReleaseAll() for ERRORs since commands aren't run inside a
> transaction command...
>
> But maybe I should have fixed this by adding the release to
> WalSndErrorCleanup() instead? That'd still leave the problematic case
> that currently we try to delete a replication slot inside a CATCH when
> we fail while initializing the rest of logical replication... But I
> guess adding it would be a good idea independent of that.

+1. I think that if we can't rely on error handling to clean up the
same things everywhere, it's gonna be a mess. People won't be able to
keep track of which error cleanup is engaged in which code paths, and
screw-ups will result when old code paths are called from new call
sites.

> We could also do a StartTransactionCommand() but I'd rather not, that
> currently prevents code in that vicinity from doing anything it
> shouldn't via various Assert()s in existing code.

Like what? I mean, I'm OK with having a partial error-handling
environment if that's all we need, but I think it's a bad plan to the
extent that the code here needs to be aware of error-handling
differences versus expectations elsewhere in our code.

>> This doesn't need the LWLockRelease either. It does need the
>> SpinLockRelease, but as I think I noted previously, the right way to
>> write this is probably: SpinLockAcquire(&slot->mutex); was_active =
>> slot->active; slot->active = true; SpinLockRelease(&slot->mutex); if
>> (was_active) ereport(). MyReplicatonSlot = slot.
>
> That's not really simpler tho? But if you prefer I can go that way.

It avoids a branch while holding the lock, and it puts the
SpinLockAcquire/Release pair much closer together, so it's easier to
visually verify that the lock is released in all cases.

>> ReplicationSlotsComputeRequiredXmin still acquires ProcArrayLock, and
>> the comment "Provide interlock against concurrent recomputations"
>> doesn't seem adequate to me. I guess the idea here is that we regard
>> ProcArrayLock as protecting ReplicationSlotCtl->catalog_xmin and
>> ReplicationSlotCtl->data_xmin, but if that's the idea then we only
>> need to hold the lock during the time when we actually update those
>> values, not the loop where we compute them.
>
> There's a comment someplace else to that end, but yes, that's
> essentially the idea. I decided to take it during the whole
> recomputation because we also take ProcArrayLock when creating a new
> decoding slot and initially setting ->catalog_xmin. That's not strictly required
> but seemed simpler that way, and the code shouldn't be very hot.
> The code that initially computes the starting value for catalog_xmin
> when creating a new decoding slot has to take ProcArrayLock to be safe,
> that's why I though it'd be convenient to always use it for those
> values.

I don't really see why it's simpler that way. It's clearer what the
point of the lock is if you only hold it for the operations that need
to be protected by that lock.

> In all other cases where we modify *_xmin we're only increasing it which
> doesn't need a lock (HS feedback never has taken one, and
> GetSnapshotData() modifies ->xmin while holding a shared lock), the only
> potential danger is a slight delay in increasing the overall value.

Right. We might want to comment such places.

>> Also, if that's the
>> design, maybe they should be part of PROC_HDR *ProcGlobal rather than
>> here. It seems weird to have some of the values protected by
>> ProcArrayLock live in a completely different data structure managed
>> almost entirely by some other part of the system.
>
> Don't we already have cases of that? I seem to remember so. If you
> prefer having them there, I am certainly fine with doing that. This way
> they aren't allocated if slots are disabled but it's just two
> TransactionIds.

Let's go for it, unless we think of a reason not to.

>> It's pretty evident that what's currently patch #4 (only peg the xmin
>> horizon for catalog tables during logical decoding) needs to become
>> patch #1, because it doesn't make any sense to apply this before we do
>> that.
>
> Well, the slot code and the the slot support for streaming rep are
> independent from and don't use it. So they easily can come before it.

But this code is riddled with places where you track a catalog xmin
and a data xmin separately. The only point of doing it that way is to
support a division that hasn't been made yet.

>> [ discussion about crash safety of slots and their use of PANIC ]
>> Broadly, what you're trying to accomplish here is to have something
>> that is crash-safe but without relying on WAL, so that it can work on
>> standbys. If making things crash-safe without WAL were easy to do, we
>> probably wouldn't have WAL at all, so it stands to reason that there
>> are going to be some difficulties here.
>
> My big problem here is that you're asking this code to have *higher*
> guarantees than WAL ever had and currently has, not equivalent
> guarantees. Even though the likelihood of hitting problems is a least a
> magnitude or two smaller as we are dealing with minimal amounts of data.
> All the situations that seem halfway workable in the nearby thread about
> PANIC in XLogInsert() you reference are rough ideas that reduce the
> likelihood of PANICs, not remove them.
>
> I am fine with reworking things so that the first operation of several
> doesn't PANIC because we still can clearly ERROR out in that case. That
> should press the likelihood of problems into the utterly irrelevant
> area. E.g. ERROR for the rename(oldpath, newpath) and then start a
> critical section for the fsync et al.

I have zero confidence that it's OK to treat fsync() as an operation
that won't fail. Linux documents EIO as a plausible error return, for
example. (And really, how could it not?)

>> Calling a slot "old" or "new" looks liable to cause problems. Maybe
>> change those names to contain a character not allowed in a slot name,
>> if we're going to keep doing it that way.
> I wondered about making them plain files as well but given the need for
> a directory independent from this I don't really see the advantage,
> we'll need to handle them anyway during cleanup.

Yeah, sure, but if it makes for fewer in-between states, it might be worth it.

>> How about letting the xmins of such backends affect the computation as normal, and then
>> having one extra xmin that gets folded in that represents the minima
>> of the xmin of unconnected slots?
>
> That's how I had it in the beginning but it turned out that has
> noticeable performance/space impact. Surprising isn't it? The reason is
> that we'll intermittently use normal snapshots to look at the catalog
> during decoding and they will install a xmin the current proc. So, while
> that snapshot is active GetSnapshotData() will return an older xmin
> preventing HOT pruning from being as efficient.

Hrm, so you still need the flag, to indicate whether the xmin should
be included when we're computing a globalxmin for pruning of a
non-catalog table. But that doesn't necessarily mean that the value
has to live in the slot rather than the PGXACT, does it? It might be
for the best the way you have it, but it does look kind of weird. Not
sure yet.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Noah Misch 2014-01-24 17:14:13 Re: UNION ALL on partitioned tables won't use indices.
Previous Message Thom Brown 2014-01-24 17:09:49 Re: Minmax indexes