Avoiding pin scan during btree vacuum

From: Simon Riggs <simon(at)2ndQuadrant(dot)com>
To: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Avoiding pin scan during btree vacuum
Date: 2015-12-21 14:54:22
Message-ID: CANP8+jJuyExr1HnTAdZraWsWkfc-octhug7YPtzPtJcYbyi4pA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

During VACUUM of btrees, we need to pin all pages, even those where tuples
are not removed, which I am calling here the "pin scan". This is especially
a problem during redo, where removing one tuple from a 100GB btree can take
a minute or more. That causes replication lags. Bad Thing.

Previously, I have suggested ways of optimizing that and code comments
reflect that. I would like to look at removing the pin scan entirely, on a
standby only.

In
www.postgresql.org/message-id/flat/721615179(dot)3351449(dot)1423959585771(dot)JavaMail(dot)yahoo(at)mail(dot)yahoo(dot)com,
the focus was on removing pins from btrees.
In the commit message for that thread/patch,
http://git.postgresql.org/gitweb/?p=postgresql.git;a=commit;h=2ed5b87f96d473962ec5230fd820abfeaccb2069
we see that there are some preconditions to altering the locking.

"This patch leaves behavior unchanged for some cases, which can be
addressed separately so that each case can be evaluated on its own
merits. These unchanged cases are when a scan uses a non-MVCC
snapshot, an index-only scan, and a scan of a btree index for which
modifications are not WAL-logged. If later patches allow all of
these cases to drop the buffer pin after reading a leaf page, then
the btree vacuum process can be simplified; it will no longer need
the "super-exclusive" lock to delete tuples from a page."

The case for master and standby are different. The standby is simpler, yet
more important to change since the pin scan happens in the foreground.

Looking just at the case for standbys, we see there are 3 cases
* non-WAL logged indexes - does not apply on a standby, so ignore
* non-MVCC snapshot - looks like only TOAST scans are a problem on standbys
* index only scans (IOS) - the analysis of which looks wrong to me...

IOSs always check the visibility before using the tuple. If a tuple is
about to be removed by a VACUUM then the tuple will already be dead and the
visibility map will always be set to not-all-visible. So any tuple being
removed by vacuum can never cause a problem to an IOS. Hence the locking
interactions are not required, at least on standbys, for normal tables.

So it looks like we can skip the "pin scan" during redo unless we are
vacuuming a toast index.

Patch attached.

Notice that the patch does not slacken the requirement to
super-exclusive-lock the block from which tuples are being removed. The
only thing it does is skip past the requirement to pin each of the
intervening blocks where nothing has happened.

--
Simon Riggs http://www.2ndQuadrant.com/
<http://www.2ndquadrant.com/>
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Attachment Content-Type Size
avoid_standby_pin_scan.v1.patch application/octet-stream 1.8 KB

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2015-12-21 14:57:04 Re: SET SESSION AUTHORIZATION superuser limitation.
Previous Message Robert Haas 2015-12-21 14:54:07 Re: Freeze avoidance of very large table.