Re: [PATCH] Logical decoding timeline following take II

From: Andres Freund <andres(at)anarazel(dot)de>
To: Petr Jelinek <petr(at)2ndquadrant(dot)com>
Cc: Craig Ringer <craig(at)2ndquadrant(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [PATCH] Logical decoding timeline following take II
Date: 2016-11-12 15:07:57
Message-ID: 20161112150757.qgy5cw4e7buxm2my@alap3.anarazel.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 2016-10-24 17:49:13 +0200, Petr Jelinek wrote:
> + * Determine which timeline to read an xlog page from and set the
> + * XLogReaderState's state->currTLI to that timeline ID.

"XLogReaderState's state->currTLI" - the state's a bit redundant.

> + * The caller must also make sure it doesn't read past the current redo pointer
> + * so it doesn't fail to notice that the current timeline became historical.
> + */

Not sure what that means? The redo pointer usually menas the "logical
start" of the last checkpoint, but I don't see how thta could be meant
here?

> +static void
> +XLogReadDetermineTimeline(XLogReaderState *state, XLogRecPtr wantPage, uint32 wantLength)
> +{
> + const XLogRecPtr lastReadPage = state->readSegNo * XLogSegSize + state->readOff;
> +
> + elog(DEBUG4, "Determining timeline for read at %X/%X+%X",
> + (uint32)(wantPage>>32), (uint32)wantPage, wantLength);

Doing this on every single read from a page seems quite verbose. It's
also (like most or all the following debug elogs) violating the casing
prescribed by the message style guidelines.

> + /*
> + * If the desired page is currently read in and valid, we have nothing to do.
> + *
> + * The caller should've ensured that it didn't previously advance readOff
> + * past the valid limit of this timeline, so it doesn't matter if the current
> + * TLI has since become historical.
> + */
> + if (lastReadPage == wantPage &&
> + state->readLen != 0 &&
> + lastReadPage + state->readLen >= wantPage + Min(wantLength,XLOG_BLCKSZ-1))
> + {
> + elog(DEBUG4, "Wanted data already valid"); //XXX
> + return;
> + }

With that kind of comment/XXX present, this surely can't be ready for
committer?

> + /*
> + * If we're reading from the current timeline, it hasn't become historical
> + * and the page we're reading is after the last page read, we can again
> + * just carry on. (Seeking backwards requires a check to make sure the older
> + * page isn't on a prior timeline).
> + */

How can ThisTimeLineID become historical?

> + if (state->currTLI == ThisTimeLineID && wantPage >= lastReadPage)
> + {
> + Assert(state->currTLIValidUntil == InvalidXLogRecPtr);
> + elog(DEBUG4, "On current timeline");
> + return;
> + }

Also, is it actually ok to rely on ThisTimeLineID here? That's IIRC not
maintained outside of the startup process, until recovery ended (cf it
being set in InitXLOGAccess() called via RecoveryInProgress()).

> /*
> - * TODO: we're going to have to do something more intelligent about
> - * timelines on standbys. Use readTimeLineHistory() and
> - * tliOfPointInHistory() to get the proper LSN? For now we'll catch
> - * that case earlier, but the code and TODO is left in here for when
> - * that changes.
> + * Check which timeline to get the record from.
> + *
> + * We have to do it each time through the loop because if we're in
> + * recovery as a cascading standby, the current timeline might've
> + * become historical.
> */

I guess you mean cascading as "replicating logically from a physical
standby"? I don't think it's good to use cascading here, because that's
usually thought to mean doing physical SR from a standby...

> diff --git a/src/backend/replication/logical/logicalfuncs.c b/src/backend/replication/logical/logicalfuncs.c
> index 318726e..4315fb3 100644
> --- a/src/backend/replication/logical/logicalfuncs.c
> +++ b/src/backend/replication/logical/logicalfuncs.c
> @@ -234,12 +234,6 @@ pg_logical_slot_get_changes_guts(FunctionCallInfo fcinfo, bool confirm, bool bin
> rsinfo->setResult = p->tupstore;
> rsinfo->setDesc = p->tupdesc;
>
> - /* compute the current end-of-wal */
> - if (!RecoveryInProgress())
> - end_of_wal = GetFlushRecPtr();
> - else
> - end_of_wal = GetXLogReplayRecPtr(NULL);
> -
> ReplicationSlotAcquire(NameStr(*name));
>
> PG_TRY();
> @@ -279,6 +273,12 @@ pg_logical_slot_get_changes_guts(FunctionCallInfo fcinfo, bool confirm, bool bin
> /* invalidate non-timetravel entries */
> InvalidateSystemCaches();
>
> + if (!RecoveryInProgress())
> + end_of_wal = GetFlushRecPtr();
> + else
> + end_of_wal = GetXLogReplayRecPtr(NULL);
> +
> + /* Decode until we run out of records */
> while ((startptr != InvalidXLogRecPtr && startptr < end_of_wal) ||
> (ctx->reader->EndRecPtr != InvalidXLogRecPtr && ctx->reader->EndRecPtr < end_of_wal))
> {

That seems like a pretty random change?

> diff --git a/src/include/access/xlogreader.h b/src/include/access/xlogreader.h
> index deaa7f5..caff9a6 100644
> --- a/src/include/access/xlogreader.h
> +++ b/src/include/access/xlogreader.h
> @@ -27,6 +27,10 @@
>
> #include "access/xlogrecord.h"
>
> +#ifndef FRONTEND
> +#include "nodes/pg_list.h"
> +#endif

Why is that needed?
> diff --git a/src/test/modules/test_slot_timelines/README b/src/test/modules/test_slot_timelines/README
> new file mode 100644
> index 0000000..585f02f
> --- /dev/null
> +++ b/src/test/modules/test_slot_timelines/README
> @@ -0,0 +1,19 @@
> +A test module for logical decoding failover and timeline following.
> +
> +This module provides a minimal way to maintain logical slots on replicas
> +that mirror the state on the master. It doesn't make decoding possible,
> +just tracking slot state so that a decoding client that's using the master
> +can follow a physical failover to the standby. The master doesn't know
> +about the slots on the standby, they're synced by a client that connects
> +to both.
> +
> +This is intentionally not part of the test_decoding module because that's meant
> +to serve as example code, where this module exercises internal server features
> +by unsafely exposing internal state to SQL. It's not the right way to do
> +failover, it's just a simple way to test it from the perl TAP framework to
> +prove the feature works.
> +
> +In a practical implementation of this approach a bgworker on the master would
> +monitor slot positions and relay them to a bgworker on the standby that applies
> +the position updates without exposing slot internals to SQL. That's too complex
> +for this test framework though.

I still don't want to have this code in postgres, it's just an example
for doing something bad. Lets get proper feedback control in, and go
from there.

> diff --git a/src/test/modules/test_slot_timelines/expected/load_extension_1.out b/src/test/modules/test_slot_timelines/expected/load_extension_1.out
> new file mode 100644
> index 0000000..0db21e4
> --- /dev/null
> +++ b/src/test/modules/test_slot_timelines/expected/load_extension_1.out
> @@ -0,0 +1,7 @@
> +CREATE EXTENSION test_slot_timelines;
> +SELECT test_slot_timelines_create_logical_slot('test_slot', 'test_decoding');
> +ERROR: replication slots can only be used if max_replication_slots > 0
> +SELECT test_slot_timelines_advance_logical_slot('test_slot', txid_current()::text::xid, txid_current()::text::xid, pg_current_xlog_location(), pg_current_xlog_location());
> +ERROR: replication slots can only be used if max_replication_slots > 0
> +SELECT pg_drop_replication_slot('test_slot');
> +ERROR: replication slots can only be used if max_replication_slots > 0
> diff --git a/src/test/modules/test_slot_timelines/sql/load_extension.sql b/src/test/modules/test_slot_timelines/sql/load_extension.sql
> new file mode 100644
> index 0000000..2440355

It doesn't seem worthwhlie to maintain a test for this - why do we run
the tests with those settings in the first place?

Greetings,

Andres Freund

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Dilip Kumar 2016-11-12 15:17:37 Re: pg_dump, pg_dumpall and data durability
Previous Message Gilles Darold 2016-11-12 13:59:44 Re: Patch to implement pg_current_logfile() function