Re: make async slave to wait for lsn to be replayed

From: i(dot)kartyshov(at)postgrespro(dot)ru
To: pgsql-hackers(at)postgresql(dot)org
Subject: Re: make async slave to wait for lsn to be replayed
Date: 2017-09-04 12:26:47
Message-ID: 0a26d4c3-b100-3274-cfda-f85940cfa974@postgrespro.ru
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hello, thank you for your comments over main idea and code.

On 13.03.2017 05:20, Thomas Munro wrote:
1)
> First, I'll restate my view of the syntax-vs-function question: I
> think an fmgr function is the wrong place to do this, because it
> doesn't work for our 2 higher isolation levels as mentioned. Most
> people probably use READ COMMITTED most of the time so the extension
> version you've proposed is certainly useful for many people and I like
> it, but I will vote against inclusion in core of any feature that
> doesn't consider all three of our isolation levels, especially if
> there is no way to extend support to other levels later. I don't want
> PostgreSQL to keep adding features that eventually force everyone to
> use READ COMMITTED because they want to use feature X, Y or Z.

Waiting for LSN is expected to be used on hot standby READ ONLY
replication.
Only READ COMMITTED and REPEATABLE READ, are allowed on hot standby.

> Maybe someone can think of a clever way for an extension to insert a
> wait for a user-supplied LSN *before* acquiring a snapshot so it can
> work for the higher levels, or maybe the hooks should go into core
> PostgreSQL so that the extension can exist as an external project not
> requiring a patched PostgreSQL installation, or maybe this should be
> done with new core syntax that extends transaction commands. Do other
> people have views on this?

I think it is a good idea, but it is not clear for me, how to do it
properly.

2)
> +wl_lsn_updated_hook(void)
> +{
> + uint i;
> + /*
> + * After update lastReplayedEndRecPtr set Latches in SHMEM array
> + */
> + if (counter_waitlsn % count_waitlsn == 0
> + ||
> TimestampDifferenceExceeds(time_waitlsn,GetCurrentTimestamp(),interval_waitlsn))
> + {
>
> Doesn't this mean that if you are waiting for LSN 1234, and the
> primary sends that LSN and then doesn't send anything for another
> hour, a standby waiting in pg_waitlsn is quite likely to skip that
> update (either because of count_waitlsn or interval_waitlsn), and then
> finish up waiting for a very long time?
>
> I'm not sure if this is a good idea, but it's an idea: You could keep
> your update skipping logic, but make sure you always catch the
> important case where recovery hits the end of WAL, by invoking your
> callback from WaitForWALToBecomeAvailable. It could have a boolean
> parameter that means 'don't skip this one!'. In other words, it's OK
> to skip updates, but not if you know there is no more WAL available to
> apply (since you have no idea how long it will be for more to arrive).
>
> Calling GetCurrentTimestamp() at high frequency (after every WAL
> record is replayed) may be a bad idea. It's a slow system call on
> some operating systems. Can we use an existing timestamp for free,
> like recoveryLastXTime? Remembering that the motivation for using
> this feature is to wait for *whole transactions* to be replayed and
> become visible, there is no sensible reason to wait for random WAL
> positions inside a transaction, so if you used that then you would
> effectively skip all non-COMMIT records and also skip some COMMIT
> records that are coming down the pipe too fast.

Yes, I applied this idea and prepared new patch.

3)
> +static void
> +wl_own_latch(void)
> +{
> + SpinLockAcquire(&state->l_arr[MyBackendId].slock);
> + OwnLatch(&state->l_arr[MyBackendId].latch);
> + is_latch_owned = true;
> +
> + if (state->backend_maxid < MyBackendId)
> + state->backend_maxid = MyBackendId;
> +
> + state->l_arr[MyBackendId].pid = MyProcPid;
> + SpinLockRelease(&state->l_arr[MyBackendId].slock);
> +}
>
> What is the point of using extra latches for this? Why not just use
> the standard latch? Syncrep and many other things do that. I'm not
> actually sure if there is ever a reason to create more latches in
> regular backends. SIGUSR1 will be delivered and set the main latch
> anyway.
>
> There are cases of specialised latches in the system, like the wal
> receiver latch, and I'm reliably informed that that solves problems
> like delivering a wakeup message without having to know which backend
> is currently the wal receiver (a problem we might consider solving
> today with a condition variable?) I don't think anything like that
> applies here.

In my case I create a bunch of shared latches for each backend, I`ll
think
how to use standard latches in an efficient way. And about specialized
latches on standby they are already in busy with wal replaying
functions.

4)
> + for (i = 0; i <= state->backend_maxid; i++)
> + {
> + SpinLockAcquire(&state->l_arr[i].slock);
> + if (state->l_arr[i].pid != 0)
> + SetLatch(&state->l_arr[i].latch);
> + SpinLockRelease(&state->l_arr[i].slock);
> + }
>
> Once we get through the update-skipping logic above, we hit this loop
> which acquires spinlocks for every backend one after another and sets
> the latches of every backend, no matter whether they are waiting for
> the LSN that has been applied. Assuming we go with this
> scan-the-whole-array approach, why not include the LSN waited for in
> the array slots, so that we can avoid disturbing processes that are
> waiting for a later LSN?

Done.

> Could you talk a bit about the trade-off between this approach and a
> queue based approach? I would like to understand whether this really
> is the best way to do it.
> One way to use a queue would be
> ConditionVariableBroadcast(&state->lsn_moved), and then waiters would
> use a condition variable wait loop. That would make your code much
> simpler (you wouldn't even need your array of spinlocks + latches) but
> would still have the problem of processes waking up even though
> they're actually waiting for a later LSN. Another choice would be to
> create a custom wait list which actually holds the LSNs waited for in
> sorted order, so that we wake up exactly the right processes, cheaply,
> or in arbitrary order which makes insertion cheaper but search more
> expensive.

I`ll think how to implemented waiting for lsn with queue in next patch.

Thank you for your review.

--
Ivan Kartyshov
Postgres Professional: http://www.postgrespro.com
Russian Postgres Company

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Simon Riggs 2017-09-04 12:27:54 Re: Release Note changes
Previous Message Alexey Chernyshov 2017-09-04 12:17:30 Re: index-only count(*) for indexes supporting bitmap scans