Re: Changeset Extraction v7.0 (was logical changeset generation)

From: Andres Freund <andres(at)2ndquadrant(dot)com>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Changeset Extraction v7.0 (was logical changeset generation)
Date: 2014-01-15 20:39:01
Message-ID: 20140115203901.GF8653@awork2.anarazel.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi,

On 2014-01-15 13:28:25 -0500, Robert Haas wrote:
> - I think you should just regard ReplicationSlotCtlLock as protecting
> the "name" and "active" flags of every slot. ReplicationSlotCreate()
> would then not need to mess with the spinlocks at all, and
> ReplicationSlotAcquire and ReplicationSlotDrop get a bit simpler too I
> think. Functions like ReplicationSlotsComputeRequiredXmin() can
> acquire this lock in shared mode and only lock the slots that are
> actually active, instead of all of them.

I first thought you meant that we should get rid of the spinlock, but
after rereading I think you just mean that ->name, ->active, ->in_use
are only allowed to change while holding the lwlock exclusively so we
don't need to spinlock in those cases? If so, yes, that works for me.

> - If you address /* FIXME: apply sanity checking to slot name */, then
> I think that also addresses /* XXX: do we want to use truncate
> identifier instead? */. In other words, let's just error out if the
> name is too long. I'm not sure what other sanity checking is needed
> here; maybe just restrict it to 7-bit characters to avoid problems
> with encodings for individual databases varying.

Yea, erroring out seems like a good idea. But I think we need to
restrict slot names a bit more than that, given they are used as
filenames... We could instead name the files using the slot's offset,
but I'd prefer to not go that route.

> - ReplicationSlotAcquire probably needs to ignore slots that are not active.

Not sure what you mean? If the slot isn't in_use we'll skip it in the loop.

> - If there's a coding rule that slot->database can't be changed while
> the slot is active, then the check to make sure that the user isn't
> trying to bind to a slot with a mis-matching database could be done
> before the code described in the previous point, avoiding the need to
> go back and release the resource.

I don't think slot->database should be allowed to change at all...

> - I think the critical section in ReplicationSlotDrop is bogus. If
> DeleteSlot() fails, we scarcely need to PANIC. The slot just isn't
> gone.

Well, if delete slot fails, we don't really know at which point it
failed which means that the on-disk state might not correspond to the
in-memory state. I don't see a point in adding code trying to handle
that case correctly...

> - cancel_before_shmem_exit is only guaranteed to remove the
> most-recently-added callback.

Yea :(. I think that's safe for the current usages but seems mighty
fragile... Not sure what to do about it. Just register KillSlot once and
keep it registered?

> - Why does ReplicationSlotsComputeRequiredXmin() need to hold
> ProcArrayLock at all?

There's reasoning, but I just noticed that it's basis might be flawed
anyway :(.
When starting changeset extraction in a new slot we need to guarantee
that we only start decoding records we know the catalog tuples haven't
been removed for.
So, when creating the slot I've so far done a GetOldestXmin() and used
that to check against xl_running_xact->oldestRunningXid. But
GetOldestXmin() can go backwards...

I'll think a bit and try to simplify this.

> - ReplicationSlotsComputeRequiredXmin scarcely needs to hold the
> spinlock while it does all of those gyrations. It can just acquire
> the spinlock, copy the three fields needed into local variables, and
> release the spinlock. The rest can be worked out afterwards.
> Similarly in ReplicationSlotsComputeRequiredXmin.

Yea, will change.

> - A comment in KillSlot wonders whether locking is required. I say
> yes. It's safe to take lwlocks and spinlocks during shmem exit, and
> failing to do so seems like a recipe for subtle corner-case bugs.

I agree that it's safe to use spinlocks, but lwlocks? What if we are
erroring out while holding an lwlock? Code that runs inside a
TransactionCommand is protected against that, but if not ProcKill()
which invokes LWLockReleaseAll() runs pretty late in the teardown
process...

> - pg_get_replication_slots() wonders what permissions we require. I
> don't know that any special permissions are needed here; the data
> we're exposing doesn't appear to be sensitive. Unless I'm missing
> something?

I don't see a problem either, but it seems others have -
pg_stat_replication only displays minor amounts of information if one
doesn't have the replication privilege... Not sure what the reasoning
there is, and whether it applies here as well.

> - There seems to be no interface to acquire or release slots from
> either SQL or the replication protocol, nor any way for a client of
> this code to update its slot details.

I don't think either ever needs to do that - START_TRANSACTION SLOT slot
...; and decoding_slot_*changes will acquire/release for them while
active. What would the usecase be to allow them to be acquired from SQL?

The slot details get updates by the respective replication code. For
streaming rep, that should happen via reply and feedback messages. For
changeset extraction it happens when LogicalConfirmReceivedLocation() is
called; the walsender interface does that using reply messages, the SQL
interface calls it when finished (unless you use the _peek_ functions).

> The value of
> catalog_xmin/data_xmin vs. effective_catalog_xmin/effective_data_xmin
> is left to the imagination.

There's a comment about them in a following patch. Basically the reason
is that for changeset extraction we cannot adjust the in-memory value
before we know the changed slot status is safely synced to
disk. Otherwise a client could restart streaming at a LSN where the
corresponding catalog details are gone since it's not prevented by the
slot anymore.

That could be done by just holding some lock forbidding the global xmin
value to be recomputing while writing to disk, but that seems awfully
heavy-handed. So the protocol is:
1) update ->catalog_xmin to the new xmin,
2) sync slot to disk
3) set ->effective_catalog_xmin = ->catalog_xmin
4) ReplicationSlotsComputeRequiredXmin()

Since ComputeRequiredXmin() only looks at effective_catalog_xmin that
guarantees that the global xmin horizon doesn't increase before the the
slot has been synced to disk. If we crash after 2,
StartupReplicationSlots() simply sets effective_catalog_xmin =
catalog_xmin, we know it's safely on disk now since we've just read it
from there.

Thanks for committing 0001!

Regards,

Andres

--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Florian Pflug 2014-01-15 20:45:35 Re: [PATCH] Negative Transition Aggregate Functions (WIP)
Previous Message Stephen Frost 2014-01-15 20:35:21 Re: Why conf.d should be default, and auto.conf and recovery.conf should be in it