Re: Making all nbtree entries unique by having heap TIDs participate in comparisons

From: Peter Geoghegan <pg(at)bowt(dot)ie>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Heikki Linnakangas <hlinnaka(at)iki(dot)fi>, Peter Eisentraut <peter(dot)eisentraut(at)2ndquadrant(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>, Alexander Korotkov <a(dot)korotkov(at)postgrespro(dot)ru>, Thomas Munro <thomas(dot)munro(at)enterprisedb(dot)com>, Claudio Freire <klaussfreire(at)gmail(dot)com>, Anastasia Lubennikova <a(dot)lubennikova(at)postgrespro(dot)ru>, "Andrey V(dot) Lepikhov" <a(dot)lepikhov(at)postgrespro(dot)ru>
Subject: Re: Making all nbtree entries unique by having heap TIDs participate in comparisons
Date: 2018-11-24 23:13:42
Views: Raw Message | Whole Thread | Download mbox
Lists: pgsql-hackers

Attached is v8 of the patch series, which has some relatively minor changes:

* A new commit adds an artificial tie-breaker column to pg_depend
indexes, comprehensively solving the issues with regression test
instability. This is the only really notable change.

* Clean-up of how the design in described in the nbtree README, and
elsewhere. I want to make it clear that we're now more or less using
the Lehman and Yao design. I re-read the Lehman and Yao paper to make
sure that the patch acknowledges what Lehman and Yao say to expect, at
least in cases that seemed to matter.

* Stricter verification by contrib/amcheck. Not likely to catch a case
that wouldn't have been caught by previous revisions, but should make
the design a bit clearer to somebody following L&Y.

* Tweaks to how _bt_findsplitloc() accumulates candidate split points.
We're less aggressive in choosing a smaller tuple during an internal
page split in this revision.

The overall impact of the pg_depend change is that required regression
test output changes are *far* less numerous than they were in v7.
There are now only trivial differences in the output order of items.
And, there are very few diagnostic message changes overall -- we see
exactly 5 changes now, rather than dozens. Importantly, there is no
longer any question about whether I could make diagnostic messages
less useful to users, because the existing behavior for
findDependentObjects() is retained. This is an independent
improvement, since it fixes an independent problem with test
flappiness that we've been papering-over for some time [2] -- I make
the required order actually-deterministic, removing heap TID ordering
as a factor that can cause seemingly-random regression test failures
on slow/overloaded buildfarm animals.

Robert Haas remarked that he thought that the pg_depend index
tie-breaker commit's approach is acceptable [1] -- see the other
thread that Robert weighed in on for all the gory details. The patch's
draft commit message may also be interesting. Note that adding a new
column turns out to have *zero* storage overhead, because we only ever
end up filling up space that was already getting lost to alignment.

The pg_depend thing is clearly a kludge. It's ugly, though in no small
part because it acknowledges the existing reality of how
findDependentObjects() already depends on scan order. I'm optimistic
that I'll be able to push this groundwork commit before too long; it
doesn't hinge on whether or not the nbtree patches are any good.

Peter Geoghegan

Attachment Content-Type Size
v8-0003-Pick-nbtree-split-points-discerningly.patch text/x-patch 42.2 KB
v8-0004-Add-split-at-new-tuple-page-split-optimization.patch text/x-patch 12.1 KB
v8-0005-Add-high-key-continuescan-optimization.patch text/x-patch 8.1 KB
v8-0006-DEBUG-Add-pageinspect-instrumentation.patch text/x-patch 7.8 KB
v8-0002-Treat-heap-TID-as-part-of-the-nbtree-key-space.patch text/x-patch 162.8 KB
v8-0001-Add-pg_depend-index-scan-tiebreaker-column.patch text/x-patch 15.0 KB

In response to


Browse pgsql-hackers by date

  From Date Subject
Next Message Thomas Munro 2018-11-24 23:59:37 Re: pgsql: Add WL_EXIT_ON_PM_DEATH pseudo-event.
Previous Message Andrew Gierth 2018-11-24 21:54:54 Re: Centralize use of PG_INTXX_MIN/MAX for integer limits