Re: hot_standby_feedback vs excludeVacuum and snapshots

From: Andres Freund <andres(at)anarazel(dot)de>
To: Greg Stark <stark(at)mit(dot)edu>
Cc: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: hot_standby_feedback vs excludeVacuum and snapshots
Date: 2018-06-07 21:25:01
Message-ID: 20180607212501.2ianznlycwu4kuci@alap3.anarazel.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 2018-06-07 14:19:18 -0700, Andres Freund wrote:
> Hi,
>
> On 2018-03-29 12:17:24 +0100, Greg Stark wrote:
> > I'm poking around to see debug a vacuuming problem and wondering if
> > I've found something more serious.
> >
> > As far as I can tell the snapshots on HOT standby are built using a
> > list of running xids that the primary builds and puts in the WAL and
> > that seems to include all xids from transactions running in all
> > databases. The HOT standby would then build a snapshot and eventually
> > send the xmin of that snapshot back to the primary in the hot standby
> > feedback and that would block vacuuming tuples that might be visible
> > to the standby.
>
> > Many ages ago Alvaro sweated blood to ensure vacuums could run for
> > long periods of time without holding back the xmin horizon and
> > blocking other vacuums from cleaning up tuples. That's the purpose of
> > the excludeVacuum flag in GetCurrentVirtualXIDs(). That's possible
> > because we know vacuums won't insert any tuples that queries might try
> > to view and also vacuums won't try to perform any sql queries on other
> > tables.
>
> > I can't find anywhere that the standby snapshot building mechanism
> > gets this same information about which xids are actually vacuums that
> > can be ignored when building a snapshot. So I'm concerned that the hot
> > standby sending back its xmin would be effectively undermining this
> > mechanism and forcing vacuum xids to be included in the xmin horizon
> > and prevent vacuuming of tuples.
>
> > Am I missing something obvious? Is this a known problem?
>
> Maybe I'm missing something, but the running transaction data reported
> to the standby does *NOT* include anything about lazy vacuums - they
> don't have an xid. The reason there's PROC_IN_VACUUM etc isn't the xid,
> it's *xmin*, no?
>
> We currently do acquire an xid when truncating the relation - but I
> think it'd somewhat fair to argue that that's somewhat of a bug. The
> reason a log is acquired is that we need to log AEL locks, and that
> currently means they have to be assigned to a transaction.
>
> Given that the truncation happens at the end of VACUUM and it *NEEDS* to
> be present on the standby - otherwise the locking stuff is useless - I
> don't think the fix commited in this thread is correct.
>
> Wonder if the right thing here wouldn't be to instead transiently
> acquire an AEL lock during replay when truncating a relation?

Isn't the fact that vacuum truncation requires an AEL, and that the
change committed today excludes those transactions from running xacts
records, flat out broken?

Look at:

void
ProcArrayApplyRecoveryInfo(RunningTransactions running)
...
/*
* Remove stale locks, if any.
*
* Locks are always assigned to the toplevel xid so we don't need to care
* about subxcnt/subxids (and by extension not about ->suboverflowed).
*/
StandbyReleaseOldLocks(running->xcnt, running->xids);

by excluding running transactions you have, as far as I can tell,
effectively removed the vacuum truncation AEL from the standby. Now that
only happens when a running xact record is logged, but as that happens
in the background...

I also don't understand why this change would be backpatched in the
first place. It's a relatively minor efficiency thing, no?

Greetings,

Andres Freund

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Asim R P 2018-06-07 21:34:22 Column store in Greenplum
Previous Message Alvaro Herrera 2018-06-07 21:23:57 Re: SHOW ALL does not honor pg_read_all_settings membership