Re: pg_receivewal starting position

From: Michael Paquier <michael(at)paquier(dot)xyz>
To: Ronan Dunklau <ronan(dot)dunklau(at)aiven(dot)io>
Cc: Kyotaro Horiguchi <horikyota(dot)ntt(at)gmail(dot)com>, bharath(dot)rupireddyforpostgres(at)gmail(dot)com, pgsql-hackers(at)lists(dot)postgresql(dot)org, sawada(dot)mshk(at)gmail(dot)com
Subject: Re: pg_receivewal starting position
Date: 2021-10-28 12:31:36
Message-ID: YXqYKAdVEqmyTltK@paquier.xyz
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Wed, Oct 27, 2021 at 10:11:00AM +0200, Ronan Dunklau wrote:
> Sorry I sent an intermediary version of the patch, here is the correct one.

While looking at this patch, I have figured out a simple way to make
the tests faster without impacting the coverage. The size of the WAL
segments archived is a bit of a bottleneck as they need to be zeroed
by pg_receivewal at creation. This finishes by being a waste of time,
even if we don't flush the data. So I'd like to switch the test so as
we use a segment size of 1MB, first.

A second thing is that we copy too many segments than really needed
when using the slot's restart_lsn as starting point as RESERVE_WAL
would use the current redo location, so it seems to me that a
checkpoint is in order before the slot creation. A third thing is
that generating some extra data after the end LSN we want to use makes
the test much faster at the end.

With those three methods combined, the test goes down from 11s to 9s
here. Attached is a patch I'd like to apply to make the test
cheaper.

I also had a look at your patch. Here are some comments.

+# Cleanup the previous stream directories to reuse them
+unlink glob "'${stream_dir}/*'";
+unlink glob "'${slot_dir}/*'";
I think that we'd better choose a different location for the
archives. Keeping the segments of the previous tests is useful for
debugging if a previous step of the test failed.

+$standby->psql('',
+ "CREATE_REPLICATION_SLOT $folder_slot PHYSICAL (RESERVE_WAL)",
+ replication => 1);
Here as well we could use a restart point to reduce the number of
segments archived.

+# Now, try to resume after the promotion, from the folder.
+$standby->command_ok(
+ [ 'pg_receivewal', '-D', $stream_dir, '--verbose', '--endpos', $nextlsn,
+ '--slot', $folder_slot, '--no-sync'],
+ "Stream some wal after promoting, resuming from the folder's position");
What is called the "resume-from-folder" test in the patch is IMO
too costly and brings little extra value, requiring two commands of
pg_receivewal (one to populate the folder and one to test the TLI jump
from the previously-populated point) to test basically the same thing
as when the starting point is taken from the slot. Except that
restarting from the slot is twice cheaper. The point of the
resume-from-folder case is to make sure that we restart from the point
of the archive folder rather than the slot's restart_lsn, but your
test fails this part, in fact, because the first command populating
the archive folder also uses "--slot $folder_slot", updating the
slot's restart_lsn before the second pg_receivewal uses this slot
again.

I think, as a whole, that testing for the case where an archive folder
is populated (without a slot!), followed by a second command where we
use a slot that has a restart_lsn older than the archive's location,
to not be that interesting. If we include such a test, there is no
need to include that within the TLI jump part, in my opinion. So I
think that we had better drop this part of the patch, and keep only
the case where we resume from a slot for the TLI jump.

The commands of pg_receivewal included in the test had better use -n
so as there is no infinite loop on failure.
--
Michael

Attachment Content-Type Size
receivewal-tap-cheaper.patch text/x-diff 1.7 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Bharath Rupireddy 2021-10-28 12:41:59 Re: Isn't it better with "autovacuum worker...." instead of "worker took too long to start; canceled" specific to "auto
Previous Message vignesh C 2021-10-28 10:46:51 Re: Skipping logical replication transactions on subscriber side