Re: LP_DEAD hinting and not holding on to a buffer pin on leaf page (Was: [WIP] Zipfian distribution in pgbench)

From: Peter Geoghegan <pg(at)bowt(dot)ie>
To: Kevin Grittner <kgrittn(at)gmail(dot)com>
Cc: Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, Alik Khilazhev <a(dot)khilazhev(at)postgrespro(dot)ru>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: LP_DEAD hinting and not holding on to a buffer pin on leaf page (Was: [WIP] Zipfian distribution in pgbench)
Date: 2017-07-26 03:02:51
Message-ID: CAH2-Wz=SfAKVMv1x9Jh19EJ8am8TZn9f-yECipS9HrrRqSswnA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Tue, Jul 25, 2017 at 3:02 PM, Peter Geoghegan <pg(at)bowt(dot)ie> wrote:
> I've been thinking about this a lot, because this really does look
> like a pathological case to me. I think that this workload is very
> sensitive to how effective kill_prior_tuples/LP_DEAD hinting is. Or at
> least, I can imagine that mechanism doing a lot better than it
> actually manages to do here. I wonder if it's possible that commit
> 2ed5b87f9, which let MVCC snapshots not hold on to a pin on leaf
> pages, should have considered workloads like this.

While the benchmark Alik came up with is non-trivial to reproduce, I
can show a consistent regression for a simple case with only one
active backend. I'm not sure whether or not this is considered an
acceptable trade-off -- I didn't look through the archives from around
March of 2015 just yet.

Setup:

Initialize pgbench (any scale factor).
create index on pgbench_accounts (aid);

The point of this example is to show a simple query that is never
quite HOT-safe, where we need to create a new index entry in an index
for that reason alone. Theoretically, only one index needs to be
updated, not all indexes, because only one index covers attributes
that have truly changed. For example, if we had WARM, I think that
many of these theoretically unnecessary index tuple insertions would
not happen. When they do happen, because we don't have something like
WARM, it seems important that the kill_prior_tuples/LP_DEAD stuff does
its job. I don't think that it will consistently do that, though.

Steps:

(Set debugger breakpoint from within _bt_killitems())

Test queries (run these from a single interactive psql session):

update pgbench_accounts set abalance = abalance + 1 where aid = 763;
update pgbench_accounts set abalance = abalance + 1 where aid = 763;
update pgbench_accounts set abalance = abalance + 1 where aid = 763;

We now see that no update ever kills items within _bt_killitems(),
because our own update to the index leaf page itself nullifies our
ability to kill anything, by changing the page LSN from the one
stashed in the index scan state variable. Fortunately, we are not
really "self-blocking" dead item cleanup here, because the
_bt_check_unique() logic for killing all_dead index entries saves the
day by not caring about LSNs. However, that only happens because the
index on "aid" is a unique index.

If we drop the primary key, and create a regular index on "aid" in its
place, then the same UPDATE queries really do self-block from killing
index tuples due to changes in 2ed5b87f9, which could be pretty bad if
there wasn't some selects to do the kill_prior_tuple stuff instead. I
verified that this regression against 9.4 exists, just to be sure that
the problem wasn't somehow there all along -- the regression is real.
:-(

--
Peter Geoghegan

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Peter Geoghegan 2017-07-26 03:08:02 Re: LP_DEAD hinting and not holding on to a buffer pin on leaf page (Was: [WIP] Zipfian distribution in pgbench)
Previous Message Etsuro Fujita 2017-07-26 02:50:05 Re: Mishandling of WCO constraints in direct foreign table modification