Re: Minimal logical decoding on standbys

From: Amit Khandekar <amitdkhan(dot)pg(at)gmail(dot)com>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, Andres Freund <andres(at)anarazel(dot)de>, tushar <tushar(dot)ahuja(at)enterprisedb(dot)com>, Petr Jelinek <petr(dot)jelinek(at)2ndquadrant(dot)com>, Craig Ringer <craig(at)2ndquadrant(dot)com>, Petr Jelinek <petr(at)2ndquadrant(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>, Shaun Thomas <shaun(dot)thomas(at)2ndquadrant(dot)com>
Subject: Re: Minimal logical decoding on standbys
Date: 2019-10-03 06:35:15
Message-ID: CAJ3gD9fE=0w50sRagcs+jrktBXuJAWGZQdSTMa57CCY+Dh-xbg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Mon, 30 Sep 2019 at 23:38, Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
>
> On Mon, Sep 30, 2019 at 7:35 AM Amit Khandekar <amitdkhan(dot)pg(at)gmail(dot)com> wrote:
> > Alright. Attached is the updated patch that splits the file into two
> > files, one that does only xmin related testing, and the other test
> > file that tests conflict recovery scenarios, and also one scenario
> > where drop-database drops the slots on the database on standby.
> > Removed get_slot_xmins() and get_node_from_slotname().
> > Renamed 'replica' to 'standby'.
> > Used node->backup() function instead of pg_basebackup command.
> > Renamed $master_slot to $master_slotname, similarly for $standby_slot.
>
> In general, I think this code is getting a lot clearer and easier to
> understand in these last few revisions.
>
> Why does create_logical_slot_on_standby include sleep(1)? Does the
> test fail if you take that out?
It has not failed for me, but I think sometimes it may happen that the
system command 'pg_recvlogical' is so slow to start that before it
tries to even create the slot, the subsequent checkpoint command
concurrently runs, causing a "running transactions" record to arrive
on standby *before* even pg_recvlogical decides the starting point
from which to receive records. So effectively pg_recvlogical can miss
this record.

> If so, it's probably going to fail on
> the buildfarm even with that included, because some of the buildfarm
> machines are really slow (e.g. because they use CLOBBER_CACHE_ALWAYS,
> or because they're running on a shared system with low hardware
> specifications and an ancient disk).

Yeah right, then it makes sense to explicitly wait for the slot to
calculate the restart_lsn, and only then run the checkpoint command.
Did that now.

>
> Similarly for the sleep(1) just after you VACUUM FREEZE all the databases.
I checked that VACUUM command returns only after updating the
pg_database.datfrozenxid. So now I think it's safe to immediately run
the checkpoint command after vacuum. So removed the sleep() now.

Attached is the updated patch series.

>
> I'm not sure wait the point of the wait_for_xmins() stuff is in
> 019_standby_logical_decoding_conflicts.pl. Isn't that just duplicating
> stuff we've already tested in 018?
Actually, in 019, the function call is more to wait for
hot_standby_feedback to take effect.

--
Thanks,
-Amit Khandekar
EnterpriseDB Corporation
The Postgres Database Company

Attachment Content-Type Size
logicaldecodng_standby_v4.tar.gz application/gzip 19.5 KB

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Michael Paquier 2019-10-03 06:37:39 Re: Include RELKIND_TOASTVALUE in get_relkind_objtype
Previous Message Smith, Peter 2019-10-03 06:16:44 RE: Proposal: Make use of C99 designated initialisers for nulls/values arrays