Re: Changeset Extraction v7.6.1

From: Andres Freund <andres(at)2ndquadrant(dot)com>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Erik Rijkers <er(at)xs4all(dot)nl>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Changeset Extraction v7.6.1
Date: 2014-02-15 23:59:46
Message-ID: 20140215235946.GC7821@awork2.anarazel.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 2014-02-15 17:29:04 -0500, Robert Haas wrote:
> On Fri, Feb 14, 2014 at 4:55 AM, Andres Freund <andres(at)2ndquadrant(dot)com> wrote:
> > [ new patches ]
>
> 0001 already needs minor

Hm?

If there are conflicts, I'll push/send a rebased tomorrow or monday.

> + * the transaction's invalidations. This currently won't be needed if
> + * we're just skipping over the transaction, since that currently only
> + * happens when starting decoding, after we invalidated all caches, but
> + * transactions in other databases might have touched shared relations.
>
> I'm having trouble understanding what this means, especially the part
> after the "but".

Let me rephrase, maybe that will help:

This currently won't be needed if we're just skipping over the
transaction because currenlty we only do so during startup, to get to
the first transaction the client needs. As we have reset the catalog
caches before starting to read WAL and we haven't yet touched any
catalogs there can't be anything to invalidate.

But if we're "forgetting" this commit because it's it happened in
another database, the invalidations might be important, because they
could be for shared catalogs and we might have loaded data into the
relevant syscaches.

Better?

> + if (IsTransactionState() &&
> + GetTopTransactionIdIfAny() != InvalidTransactionId)
> + ereport(ERROR,
> + (errcode(ERRCODE_ACTIVE_SQL_TRANSACTION),
> + errmsg("cannot perform changeset
> extraction in transaction that has performed writes")));
>
> This is sort of an awkward restriction, as it makes it hard to compose
> this feature with others. What underlies the restriction, can we get
> rid of it, and if not can we include a comment here explaining why it
> exists?

Well, you can write the result into a table if you're halfway
careful... :)

I think it should be fairly easy to relax the restriction to creating a
slot, but not getting data from it. Do you think that would that be
sufficient?

> + /* register output plugin name with slot */
> + strncpy(NameStr(slot->data.plugin), plugin,
> + NAMEDATALEN);
> + NameStr(slot->data.plugin)[NAMEDATALEN - 1] = '\0';
>
> If it's safe to do this without a lock, I don't know why.

Well, the worst that could happen is that somebody else doing a SELECT *
FROM pg_replication_slot gets a incomplete plugin name... But we
certainly can hold the spinlock during it if you think that's better.

> More broadly, I wonder why this is_init stuff is in here at all.
> Maybe the stuff that happens in the is_init case should be done in the
> caller, or another helper function.

The reason I moved it in there is that after the walsender patch there
are two callers and the stuff is sufficiently complex that I having it
twice lead to bugs.
The reason it's currenlty the same function is that sufficiently much of
the code would have to be shared that I found it it ugly to split. I'll
have a look whether I can figure something out.

> + /* prevent WAL removal as fast as possible */
> + ReplicationSlotsComputeRequiredLSN();
>
> If there's a race here, can't we rejigger the order of operations to
> eliminate it? Or is that just too hard and not worth it?

Yes, there's a small race which at the very least should be properly
documented.

Hm. Yes, I think we can plug the hole. If the race condition occurs we'd
take slightly longer to startup, which isn't bad. Will fix.

> +begin_txn_wrapper(ReorderBuffer *cache, ReorderBufferTXN *txn)
> + state.callback_name = "pg_decode_begin_txn";
> + ctx->callbacks.begin_cb(ctx, txn);
>
> I feel that there's a certain lack of naming consistency between these
> things. Can we improve that? (and similarly for parallel cases)
>
> +pg_create_decoding_replication_slot(PG_FUNCTION_ARGS)
>
> I thought we were going to have physical and logical slots, not
> physical and decoding slots.

Ok.

> + /* make sure we don't end up with an unreleased slot */
> + PG_TRY();
> + {
> ...
> + PG_CATCH();
> + {
> + ReplicationSlotRelease();
> + ReplicationSlotDrop(NameStr(*name));
> + PG_RE_THROW();
> + }
> + PG_END_TRY();
>
> I don't think this is a very good idea. The problem with doing things
> during error recovery that can themselves fail is that you'll lose the
> original error, which is not cool, and maybe even blow out the error
> stack. Many people have confuse PG_TRY()/PG_CATCH() with an
> exception-handling system, but it's not. One way to fix this is to
> put some of the initialization logic in ReplicationSlotCreate() just
> prior to calling CreateSlotOnDisk(). If the work that needs to be
> done is too complex or protracted to be done there, then I think that
> it should be pulled out of the act of creating the replication slot
> and made to happen as part of first use, or as a separate operation
> like PrepareForLogicalDecoding.

I think what should be done here is adding a drop_on_release flag. As
soon as everything important is done, it gets unset.

With some small changes that'd be highly useful for pg_basebackup as
well, because it could simply create a slot to prevent removal of
important WAL. Hm, maybe name it release_on_error, and call it from the
error locations.
You previously (IM?) argued that that'd be problematic because of locks,
but in all the error handling situations we'll already have released
lwlocks. And we rely on being able to tacke lwlocks there,
c.f. TerminateBufferIO et al.

> + * When the client has confirmed flushes >= candidate_xmin_after we can
>
> candidate_xmin_after is not otherwise referenced; incomplete identifier rename?
>
> Nothing in patch 1 sets PROC_IN_LOGICAL_DECODING. Is that right?

CreateDecodingContext() does.

> ReorderBufferGetTXN() should get a comment about the performance
> impact of this. There's a tiny bit there in ReorderBufferReturnTXN()
> but it should be better called out. Should these call the valgrind
> macros to make the memory inaccessible while it's being held in cache?

Hm, I think it does call the valgrind stuff?
VALGRIND_MAKE_MEM_UNDEFINED(txn, sizeof(ReorderBufferTXN));
VALGRIND_MAKE_MEM_DEFINED(&txn->node, sizeof(txn->node));

> My eyes are starting to glaze over, so more later.

Thanks! Will address asap.

Greetings,

Andres Freund

--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Peter Eisentraut 2014-02-16 00:05:11 Re: [PATCH] Relocation of tablespaces in pg_basebackup
Previous Message Andres Freund 2014-02-15 23:32:57 Re: narwhal and PGDLLIMPORT